Skip to main content

Quickstart with dbt Mesh

dbt Cloud
Quickstart
Intermediate
Menu

    Introduction

    dbt Mesh is a framework that helps organizations scale their teams and data assets effectively. It promotes governance best practices and breaks large projects into manageable sections for faster data development. dbt Mesh is available for dbt Cloud Enterprise accounts.

    This guide will teach you how to set up a multi-project design using foundational concepts of dbt Mesh and how to implement a data mesh in dbt Cloud:

    • Set up a foundational project called “Jaffle | Data Analytics”
    • Set up a downstream project called “Jaffle | Finance”
    • Add model access, versions, and contracts
    • Set up a dbt Cloud job that is triggered on completion of an upstream job

    For more information on why data mesh is important, read this post: What is data mesh? The definition and importance of data mesh.

    Videos for you

    You can check out dbt Fundamentals for free if you're interested in course learning with videos.

    You can also watch the YouTube video on dbt and Snowflake.

    Prerequisites​

    To leverage dbt Mesh, you need the following:

    This guide assumes you have experience with or fundamental knowledge of dbt. Take the dbt Fundamentals course first if you are brand new to dbt.

    Create and configure two projects

    In this section, you'll create two new, empty projects in dbt Cloud to serve as your foundational and downstream projects:

    • Foundational projects (or upstream projects) typically contain core models and datasets that serve as the base for further analysis and reporting.
    • Downstream projects build on these foundations, often adding more specific transformations or business logic for dedicated teams or purposes.

    For example, the always-enterprising and fictional account "Jaffle Labs" will create two projects for their data analytics and finance team: Jaffle | Data Analytics and Jaffle | Finance.

    Create two new dbt Cloud projects named 'Jaffle | Data Analytics' and 'Jaffle Finance' Create two new dbt Cloud projects named 'Jaffle | Data Analytics' and 'Jaffle Finance'

    To create a new project in dbt Cloud:

    1. From Account settings, click + New Project.
    2. Enter a project name and click Continue.
      • Use "Jaffle | Data Analytics" for one project
      • Use "Jaffle | Finance" for the other project
    3. Select your data platform, then Next to set up your connection.
    4. In the Configure your environment section, enter the Settings for your new project.
    5. Click Test Connection. This verifies that dbt Cloud can access your data platform account.
    6. Click Next if the test succeeded. If it fails, you might need to go back and double-check your settings.
      • For this guide, make sure you create a single development and Deployment per project.
        • For "Jaffle | Data Analytics", set the default database to jaffle_da.
        • For "Jaffle | Finance", set the default database to jaffle_finance
    Navigate to 'Account settings' and then click + 'New Project' to create new projects in dbt CloudNavigate to 'Account settings' and then click + 'New Project' to create new projects in dbt Cloud
    1. Continue the prompts to complete the project setup. Once configured, each project should have:
      • A data platform connection
      • New git repo
      • One or more environments (such as development, deployment)

    Create a production environment

    In dbt Cloud, each project can have one deployment environment designated as "Production.". You must set up a "Production" or "Staging" deployment environment for each project you want to "mesh" together. This enables you to leverage dbt Explorer in the later steps of this guide.

    To set a production environment:

    1. Navigate to Deploy -> Environments, then click Create New Environment.
    2. Select Deployment as the environment type.
    3. Under Set deployment type, select the Production button.
    4. Select the dbt version.
    5. Continue filling out the fields as necessary in the Deployment connection and Deployment credentials sections.
    6. Click Test Connection to confirm the deployment connection.
    7. Click Save to create a production environment.
    Set your production environment as the default environment in your Environment SettingsSet your production environment as the default environment in your Environment Settings

    Set up a foundational project

    This upstream project is where you build your core data assets. This project will contain the raw data sources, staging models, and core business logic.

    dbt Cloud enables data practitioners to develop in their tool of choice and comes equipped with a local dbt Cloud CLI or in-browser dbt Cloud IDE.

    In this section of the guide, you will set the "Jaffle | Data Analytics" project as your foundational project using the dbt Cloud IDE.

    1. First, navigate to the Develop page to verify your setup.
    2. Click Initialize dbt project if you’ve started with an empty repo:
    Initialize repoInitialize repo
    1. Delete the models/example folder.
    2. Navigate to the dbt_project.yml file and rename the project (line 5) from my_new_project to analytics.
    3. In your dbt_project.yml file, remove lines 39-42 (the my_new_project model reference).
    4. In the File Explorer, hover over the project directory and click the ..., then select Create file.
    5. Create two new folders: models/staging and models/core.

    Staging layer

    Now that you've set up the foundational project, let's start building the data assets. Set up the staging layer as follows:

    1. Create a new YAML file models/staging/sources.yml.

    2. Declare the sources by copying the following into the file and clicking Save.

      models/staging/sources.yml
      version: 2

      sources:
      - name: jaffle_shop
      description: This is a replica of the Postgres database used by our app
      database: raw
      schema: jaffle_shop
      tables:
      - name: customers
      description: One record per customer.
      - name: orders
      description: One record per order. Includes cancelled and deleted orders.
    3. Create a models/staging/stg_customers.sql file to select from the customers table in the jaffle_shop source.

      models/staging/stg_customers.sql
      select
      id as customer_id,
      first_name,
      last_name

      from {{ source('jaffle_shop', 'customers') }}
    4. Create a models/staging/stg_orders.sql file to select from the orders table in the jaffle_shop source.

      models/staging/stg_orders.sql
      select
      id as order_id,
      user_id as customer_id,
      order_date,
      status

      from {{ source('jaffle_shop', 'orders') }}
    5. Create a models/core/fct_orders.sql file to build a fact table with customer and order details.

      models/core/fct_orders.sql
      with customers as (
      select *
      from {{ ref('stg_customers') }}
      ),

      orders as (
      select *
      from {{ ref('stg_orders') }}
      ),

      customer_orders as (
      select
      customer_id,
      min(order_date) as first_order_date
      from orders
      group by customer_id
      ),

      final as (
      select
      o.order_id,
      o.order_date,
      o.status,
      c.customer_id,
      c.first_name,
      c.last_name,
      co.first_order_date,
      -- Note that we've used a macro for this so that the appropriate DATEDIFF syntax is used for each respective data platform
      {{ dbt_utils.datediff('first_order_date', 'order_date', 'day') }} as days_as_customer_at_purchase
      from orders o
      left join customers c using (customer_id)
      left join customer_orders co using (customer_id)
      )

      select * from final
    6. Navigate to the Command bar and execute dbt build.

    Before a downstream team can leverage assets from this foundational project, you need to first:

    • Create and define at least one model as “public”
    • Run a deployment job successfully
      • Note, Enable the Generate docs on run toggle for this job to update the dbt Explorer. Once run, you can click Explore from the upper menu bar and see your lineage, tests, and documentation coming through successfully.

    Define a public model and run first job

    In the previous section, you've arranged your basic building blocks, now let's integrate dbt Mesh.

    Although the Finance team requires the fct_orders model for analyzing payment trends, other models, particularly those in the staging layer used for data cleansing and joining, are not needed by downstream teams.

    To make fct_orders publicly available:

    1. In the models/core/core.yml file, add a access: public clause to the relevant YAML file by adding and saving the following:

      models/core/core.yml
      version: 2

      models:
      - name: fct_orders
      access: public
      description: "Customer and order details"
      columns:
      - name: order_id
      data_type: number
      description: ""

      - name: order_date
      data_type: date
      description: ""

      - name: status
      data_type: varchar
      description: "Indicates the status of the order"

      - name: customer_id
      data_type: number
      description: ""

      - name: first_name
      data_type: varchar
      description: ""

      - name: last_name
      data_type: varchar
      description: ""

      - name: first_order_date
      data_type: date
      description: ""

      - name: days_as_customer_at_purchase
      data_type: number
      description: "Days between this purchase and customer's first purchase"

    Note: By default, model access is set to "protected", which means they can only be referenced within the same project. Learn more about access types and model groups here.

    1. Navigate to the dbt Cloud IDE Lineage tab to see the model noted as Public, below the model name.
    Jaffle | Data Analytics lineageJaffle | Data Analytics lineage
    1. Go to Version control and click the Commit and Sync button to commit your changes.
    2. Merge your changes to the main or production branch.

    Create and run a dbt Cloud job

    Before a downstream team can leverage assets from this foundational project, you need to create a production environment and run a deployment job successfully.

    To run your first deployment dbt Cloud job, you will need to create a new dbt Cloud job.

    1. Click Deploy and then Jobs.
    2. Click Create job and then Deploy job.
    3. Select the Generate docs on run option. This will reflect the state of this project in the Explore section.
     Select the 'Generate docs on run' option when configuring your dbt Cloud job. Select the 'Generate docs on run' option when configuring your dbt Cloud job.
    1. Then, click Run now to trigger the job.

      Trigger a job by clicking the 'Run now' button.Trigger a job by clicking the 'Run now' button.
    2. After the run is complete, click Explore from the upper menu bar. You should now see your lineage, tests, and documentation coming through successfully.

    Reference a public model in your downstream project

    In this section, you will set up the downstream project, "Jaffle | Finance", and cross-project reference the fct_orders model from the foundational project. Navigate to the Develop page to set up our project:

    1. If you’ve also started with a new git repo, click Initialize dbt project under the Version control section.
    2. Delete the models/example folder
    3. Navigate to the dbt_project.yml file and remove lines 39-42 (the my_new_project model reference).
    4. In the File Explorer, hover over the project directory, click the ... and Select Create file.
    5. Name the file dependencies.yml.
    Create file in the dbt Cloud IDE.Create file in the dbt Cloud IDE.
    1. Add the upstream analytics project and the dbt_utils package. Click Save.

      dependencies.yml

      packages:
      - package: dbt-labs/dbt_utils
      version: 1.1.1

      projects:
      - name: analytics

    Staging layer

    Now that you've set up the foundational project, let's start building the data assets. Set up the staging layer as follows:

    1. Create a new YAML file models/staging/sources.yml and declare the sources by copying the following into the file and clicking Save.

      models/staging/sources.yml
      version: 2

      sources:
      - name: stripe
      database: raw
      schema: stripe
      tables:
      - name: payment
    2. Create models/staging/stg_payments.sql to select from the payment table in the stripe source.

      models/staging/stg_payments.sql

      with payments as (
      select * from {{ source('stripe', 'payment') }}
      ),

      final as (
      select
      id as payment_id,
      "orderID" as order_id,
      "paymentMethod" as payment_method,
      amount,
      created as payment_date
      from payments
      )

      select * from final

    Reference the public model

    You're now set to add a model that explores how payment types vary throughout a customer's journey. This helps determine whether coupon gift cards decrease with repeat purchases, as our marketing team anticipates, or remain consistent.

    1. To reference the model, use the following logic to ascertain this:

      models/core/agg_customer_payment_journey.sql

      with stg_payments as (
      select * from {{ ref('stg_payments') }}
      ),

      fct_orders as (
      select * from {{ ref('analytics', 'fct_orders') }}
      ),

      final as (
      select
      days_as_customer_at_purchase,
      -- we use the pivot macro in the dbt_utils package to create columns that total payments for each method
      {{ dbt_utils.pivot(
      'payment_method',
      dbt_utils.get_column_values(ref('stg_payments'), 'payment_method'),
      agg='sum',
      then_value='amount',
      prefix='total_',
      suffix='_amount'
      ) }},
      sum(amount) as total_amount
      from fct_orders
      left join stg_payments using (order_id)
      group by 1
      )

      select * from final

    2. Notice the cross-project ref at work! When you add the ref, the dbt Cloud IDE's auto-complete feature recognizes the public model as available.

      Cross-project ref autocomplete in the dbt Cloud IDECross-project ref autocomplete in the dbt Cloud IDE
    3. This automatically resolves (or links) to the correct database, schema, and table/view set by the upstream project.

      Cross-project ref compileCross-project ref compile
    4. You can also see this connection displayed in the live Lineage tab.

      Cross-project ref lineageCross-project ref lineage

    Add model versions and contracts

    How can you enhance resilience and add guardrails to this type of multi-project relationship? You can adopt best practices from software engineering by:

    1. Defining model contracts Set up model contracts in dbt to define a set of upfront "guarantees" that define the shape of your model. While building your model, dbt will verify that your model's transformation will produce a dataset matching up with its contract; if not, the build fails.
    2. Defining model versions Use model versions to manage updates and handle breaking changes systematically.

    Set up model contracts

    As part of the Data Analytics team, you may want to ensure the fct_orders model is reliable for downstream users, like the Finance team.

    1. Navigate to models/core/core.yml and under the fct_orders model, add a data contract to enforce reliability:
    models:
    - name: fct_orders
    access: public
    description: "Customer and order details"
    config:
    contract:
    enforced: true
    1. Test what would happen if this contract were violated. In models/core/fct_orders.sql, comment out the orders.status column and click Build to try building the model.
      • If the contract is breached, the build fails, as seen in the command bar history.
        The data contract was breached and the dbt build run failed.The data contract was breached and the dbt build run failed.

    Set up model versions

    In this section, you will set up model versions by the Data Analytics team as they upgrade the fct_orders model while offering backward compatibility and a migration notice to the downstream Finance team.

    1. Rename the existing model file from models/core/fct_orders.sql to models/core/fct_orders_v1.sql.

    2. Create a new file models/core/fct_orders_v2.sql and adjust the schema:

      • Comment out orders.status
      • Add a new field, is_return to indicate if an order was returned.
    3. Then, add the following to your models/core/core.yml file:

      • The is_return column
      • The two model versions
      • A latest_version to indicate which model is the latest (and should be used by default, unless specified otherwise)
      • A deprecation_date to version 1 as well to indicate
    4. It should now read as follows:

    models/core/core.yml

    version: 2

    models:
    - name: fct_orders
    access: public
    description: "Customer and order details"
    latest_version: 2
    config:
    contract:
    enforced: true
    columns:
    - name: order_id
    data_type: number
    description: ""

    - name: order_date
    data_type: date
    description: ""

    - name: status
    data_type: varchar
    description: "Indicates the status of the order"

    - name: is_return
    data_type: boolean
    description: "Indicates if an order was returned"

    - name: customer_id
    data_type: number
    description: ""

    - name: first_name
    data_type: varchar
    description: ""

    - name: last_name
    data_type: varchar
    description: ""

    - name: first_order_date
    data_type: date
    description: ""

    - name: days_as_customer_at_purchase
    data_type: number
    description: "Days between this purchase and customer's first purchase"

    # Declare the versions, and highlight the diffs
    versions:

    - v: 1
    deprecation_date: 2024-06-30 00:00:00.00+00:00
    columns:
    # This means: use the 'columns' list from above, but exclude is_return
    - include: all
    exclude: [is_return]

    - v: 2
    columns:
    # This means: use the 'columns' list from above, but exclude status
    - include: all
    exclude: [status]
    1. Verify how dbt compiles the ref statement based on the updates. Open a new file, add the following select statements, and click Compile. Note how each ref is compiled to the specified version (or the latest version if not specified).
    select * from {{ ref('fct_orders', v=1) }}
    select * from {{ ref('fct_orders', v=2) }}
    select * from {{ ref('fct_orders') }}

    Add a dbt Cloud job in the downstream project

    Before proceeding, make sure you commit and merge your changes in both the “Jaffle | Data Analytics” and “Jaffle | Finance” projects.

    A member of the Finance team would like to schedule a dbt Cloud job for their customer payment journey analysis immediately after the data analytics team refreshes their pipelines.

    1. In the “Jaffle | Finance” project, go to the Jobs page by navigating to Deploy and then Jobs.

    2. Then click Create job and then Deploy job.

    3. Add a name for the job, then scroll to the bottom to the Job completion section.

    4. In Job completion section, configure the job to Run when another job finishes and select the upstream job from the “Jaffle | Data Analytics” project.

      Trigger job on completionTrigger job on completion
    5. Click Save and verify the job is set up correctly.

    6. Go to the “Jaffle | Data Analytics” jobs page. Select the Daily job and click Run now.

    7. Once this job completes successfully, go back to the “Jaffle | Finance” jobs page. You should see that the Finance team’s job was triggered automatically.

    This simplifies the process of staying in sync with the upstream tables and removes the need for more sophisticated orchestration skills, such as coordinating jobs across projects via an external orchestrator.

    View deprecation warning

    To find out how long the Finance team has to migrate from fct_orders_v1 to fct_orders_v2, follow these steps:

    1. In the “Jaffle | Finance” project, navigate to the Develop page.
    2. Edit the cross-project ref to use v=1 in models/marts/agg_customer_payment_journey.sql:
    models/core/agg_customer_payment_journey.sql

    with stg_payments as (
    select * from {{ ref('stg_payments') }}
    ),

    fct_orders as (
    select * from {{ ref('analytics', 'fct_orders', v=1) }}
    ),

    final as (
    select
    days_as_customer_at_purchase,
    -- we use the pivot macro in the dbt_utils package to create columns that total payments for each method
    {{ dbt_utils.pivot(
    'payment_method',
    dbt_utils.get_column_values(ref('stg_payments'), 'payment_method'),
    agg='sum',
    then_value='amount',
    prefix='total_',
    suffix='_amount'
    ) }},
    sum(amount) as total_amount
    from fct_orders
    left join stg_payments using (order_id)
    group by 1
    )

    select * from final
    1. In the dbt Cloud IDE, go to Version control to commit and merge the changes.
    2. Go to the Deploy and then Jobs page.
    3. Click Run now to run the Finance job. The agg_customer_payment_journey model will build and display a deprecation date warning.
    The model will display a deprecation date warning.The model will display a deprecation date warning.

    View lineage with dbt Explorer

    Use dbt Explorer to view the lineage across projects in dbt Cloud. Navigate to the Explore page for each of your projects you should now view the lineage seamlessly across projects.

    View 'Jaffle | Data Analytics' lineage with dbt Explorer View 'Jaffle | Data Analytics' lineage with dbt Explorer

    What's next

    Congratulations 🎉! You're ready to bring the benefits of dbt Mesh to your organization. You've learned:

    • How to establish a foundational project "Jaffle | Data Analytics."
    • Create a downstream project "Jaffle | Finance."
    • Implement model access, versions, and contracts.
    • Set up up dbt Cloud jobs triggered by upstream job completions.

    Here are some additional resources to help you continue your journey:

    0