Skip to main content

Getting started with dbt Core

When you use dbt Core to work with dbt, you will be editing files locally using a code editor, and running projects using the dbt command line interface (dbt CLI). If you'd rather edit files and run projects using the web-based Integrated Development Environment (IDE), you should refer to Getting set up in dbt Cloud.

Prerequisites

Create a starter project

After setting up BigQuery to work with dbt, you are ready to create a starter project with example models, before building your own models.

Create a repository

The following steps use GitHub as the Git provider for this guide, but you can use any Git provider. You should have already created a GitHub account.

  1. Create a new GitHub repository named dbt-tutorial.
  2. Select Public so the repository can be shared with others. You can always make it private later.
  3. Leave the default values for all other settings.
  4. Click Create repository.
  5. Save the commands from "…or create a new repository on the command line" to use later in Commit your changes.

Create a project

Learn how to use a series of commands using the command line of the Terminal to create your project. dbt Core includes an init command that helps scaffold a dbt project.

To create your dbt project:

  1. Make sure you have dbt Core installed and check the version using the dbt --version command:

    dbt --version
  2. Initiate the jaffle_shop project using the init command:

    dbt init jaffle_shop
  3. Navigate into your project's directory:

    cd jaffle_shop
  4. Use pwd to confirm that you are in the right spot:

    $ pwd
    > Users/BBaggins/dbt-projects/jaffle_shop
  5. Use a code editor like Atom or VSCode to open the project directory you created in the previous steps, which we named jaffle_shop. The content includes folders and .sql and .yml files generated by the init command.

    The starter project in a code editor

    The starter project in a code editor

  6. Update the following values in the dbt_project.yml file:

    dbt_project.yml
    name: jaffle_shop # Change from the default, `my_new_project`

    ...

    profile: jaffle_shop # Change from the default profile name, `default`

    ...

    models:
    jaffle_shop: # Change from `my_new_project` to match the previous value for `name:`
    ...

Connect to BigQuery

When developing locally, dbt connects to your data warehouse using a profile, which is a yaml file with all the connection details to your warehouse.

  1. Create a file in the ~/.dbt/ directory named profiles.yml.

  2. Move your BigQuery keyfile into this directory.

  3. Copy the following and paste into the new profiles.yml file. Make sure you update the values where noted.

    profiles.yml
    jaffle_shop: # this needs to match the profile in your dbt_project.yml file
    target: dev
    outputs:
    dev:
    type: bigquery
    method: service-account
    keyfile: /Users/BBaggins/.dbt/dbt-tutorial-project-331118.json # replace this with the full path to your keyfile
    project: grand-highway-265418 # Replace this with your project id
    dataset: dbt_bbagins # Replace this with dbt_your_name, e.g. dbt_bilbo
    threads: 1
    timeout_seconds: 300
    location: US
    priority: interactive
  4. Run the debug command from your project to confirm that you can successfully connect:

    $ dbt debug
    > Connection test: OK connection ok
    A successful dbt debug command

    A successful dbt debug command

FAQs

 My data team uses a different data warehouse. What should my profiles.yml file look like for my warehouse?
 
 
 
 

Perform your first dbt run

Our sample project has some example models in it. We're going to check that we can run them to confirm everything is in order.

  1. Enter the run command to build example models:

    dbt run

You should have an output that looks like this:

A successful dbt run command

A successful dbt run command

Commit your changes

Commit your changes so that the repository contains the latest code.

  1. Link the GitHub repository you created to your dbt project by running the following commands in Terminal. Make sure you use the correct git URL for your repository, which you should have saved from step 5 in Create a repository.

    git init
    git branch -M main
    git add .
    git commit -m "Create a dbt project"
    git remote add origin https://github.com/USERNAME/dbt-tutorial.git
    git push -u origin main
  2. Return to your GitHub repository to verify your new files have been added.

Build your first models

Now that you set up your sample project, you can get to the fun part — building models! You will take a sample query and turn it into a model in your dbt project.

Checkout a new git branch

Check out a new git branch to work on new code:

  1. Create a new branch by using the checkout command and passing the -b flag:

    $ git checkout -b add-customers-model
    > Switched to a new branch `add-customer-model`

Build your first model

  1. Open your project in your favorite code editor.

  2. Create a new SQL file in the models directory, named models/customers.sql.

  3. Paste the following query into the models/customers.sql file.

    with customers as (

    select
    id as customer_id,
    first_name,
    last_name

    from `dbt-tutorial`.jaffle_shop.customers

    ),

    orders as (

    select
    id as order_id,
    user_id as customer_id,
    order_date,
    status

    from `dbt-tutorial`.jaffle_shop.orders

    ),

    customer_orders as (

    select
    customer_id,

    min(order_date) as first_order_date,
    max(order_date) as most_recent_order_date,
    count(order_id) as number_of_orders

    from orders

    group by 1

    ),

    final as (

    select
    customers.customer_id,
    customers.first_name,
    customers.last_name,
    customer_orders.first_order_date,
    customer_orders.most_recent_order_date,
    coalesce(customer_orders.number_of_orders, 0) as number_of_orders

    from customers

    left join customer_orders using (customer_id)

    )

    select * from final
  4. From the command line, enter dbt run.

    A successful run with the dbt CLI

    A successful run with the dbt CLI

When you return to the BigQuery console, you can select from this model.

FAQs

 
 
 
 
 

Change the way your model is materialized

One of the most powerful features of dbt is that you can change the way a model is materialized in your warehouse, simply by changing a configuration value. You can change things between tables and views by changing a keyword rather than writing the data definition language (DDL) to do this behind the scenes.

By default, everything gets created as a view. You can override that by materializing everything in jaffle_shop as a table. Everything in the example project will still be materialized as a view.

  1. Edit your dbt_project.yml file.

    • Update your project name to:

      dbt_project.yml
      name: 'jaffle_shop'
    • Update your models config block to:

      dbt_project.yml
      models:
      jaffle_shop:
      +materialized: table
      example:
      +materialized: view
    • Click Save.

  2. Enter the dbt run command. Your customers model should now be built as a table!

    info

    To do this, dbt had to first run a drop view statement (or API call on BigQuery), then a create table as statement.

  3. Edit models/customers.sql to override the dbt_project.yml for the customers model only by adding the following snippet to the top, and click Save:

    models/customers.sql
    {{
    config(
    materialized='view'
    )
    }}

    with customers as (

    select
    id as customer_id
    ...

    )

  4. Enter the dbt run command. Your model, customers should now build as a view.

  5. Enter the dbt run --full-refresh command for this to take effect in your warehouse.

FAQs

 
 
 

Delete the example models

You can now delete the files that dbt created when you initialized the project:

  1. Delete the models/example/ directory.

  2. Delete the example: key from your dbt_project.yml file, and any configurations that are listed under it.

    dbt_project.yml
    # before
    models:
    jaffle_shop:
    +materialized: table
    example:
    +materialized: view
    dbt_project.yml
    # after
    models:
    jaffle_shop:
    +materialized: table
  3. Save your changes.

FAQs

 
 

Build models on top of other models

As a best practice in SQL, you should separate logic that cleans up your data from logic that transforms your data. You have already started doing this in the existing query by using common table expressions (CTEs).

Now you can experiment by separating the logic out into separate models and using the ref function to build models on top of other models:

The DAG we want for our dbt project

The DAG we want for our dbt project

  1. Create a new SQL file, models/stg_customers.sql, with the SQL from the customers CTE in our original query.

  2. Create a second new SQL file, models/stg_orders.sql, with the SQL from the orders CTE in our original query.

    models/stg_customers.sql
    select
    id as customer_id,
    first_name,
    last_name

    from `dbt-tutorial`.jaffle_shop.customers
    models/stg_orders.sql
    select
    id as order_id,
    user_id as customer_id,
    order_date,
    status

    from `dbt-tutorial`.jaffle_shop.orders
  3. Edit the SQL in your models/customers.sql file as follows:

    models/customers.sql
    with customers as (

    select * from {{ ref('stg_customers') }}

    ),

    orders as (

    select * from {{ ref('stg_orders') }}

    ),

    customer_orders as (

    select
    customer_id,

    min(order_date) as first_order_date,
    max(order_date) as most_recent_order_date,
    count(order_id) as number_of_orders

    from orders

    group by 1

    ),

    final as (

    select
    customers.customer_id,
    customers.first_name,
    customers.last_name,
    customer_orders.first_order_date,
    customer_orders.most_recent_order_date,
    coalesce(customer_orders.number_of_orders, 0) as number_of_orders

    from customers

    left join customer_orders using (customer_id)

    )

    select * from final

  4. Execute dbt run

This time, when you performed a dbt run, separate views/tables were created for stg_customers, stg_orders and customers. dbt inferred the order to run these models. Because customers depends on stg_customers and stg_orders, dbt builds customers last. You do not need to explicitly define these dependencies.

FAQs

 
 
 As I create more models, how should I keep my project organized? What should I name my models?

Next steps

Before moving on from building your first models, make a change and see how it affects your results:

  • Write some bad SQL to cause an error — can you debug the error?
  • Run only a single model at a time. For more information, see Syntax overview.
  • Group your models with a stg_ prefix into a staging subdirectory. For example, models/staging/stg_customers.sql.
    • Configure your staging models to be views.
    • Run only the staging models.

You can also explore:

  • The target directory to see all of the compiled SQL. The run directory shows the create or replace table statements that are running, which are the select statements wrapped in the correct DDL.
  • The logs file to see how dbt Core logs all of the action happening within your project. It shows the selects statements that are running and the python logging happening when dbt runs.

Test and document your project

Add tests to your models

Adding tests to a project helps validate that your models are working correctly.

To add tests to your project:

  1. Create a new YAML file in the models directory, named models/schema.yml

  2. Add the following contents to the file:

    models/schema.yml
    version: 2

    models:
    - name: customers
    columns:
    - name: customer_id
    tests:
    - unique
    - not_null

    - name: stg_customers
    columns:
    - name: customer_id
    tests:
    - unique
    - not_null

    - name: stg_orders
    columns:
    - name: order_id
    tests:
    - unique
    - not_null
    - name: status
    tests:
    - accepted_values:
    values: ['placed', 'shipped', 'completed', 'return_pending', 'returned']
    - name: customer_id
    tests:
    - not_null
    - relationships:
    to: ref('stg_customers')
    field: customer_id

  3. Run dbt test, and confirm that all your tests passed.

When you run dbt test, dbt iterates through your YAML files, and constructs a query for each test. Each query will return the number of records that fail the test. If this number is 0, then the test is successful.

FAQs

 What tests are available for me to use in dbt? Can I add my own custom tests?
 
 
 Does my test file need to be named `schema.yml`?
 
 
 

Document your models

Adding documentation to your project allows you to describe your models in rich detail, and share that information with your team. Here, we're going to add some basic documentation to our project.

  1. Update your models/schema.yml file to include some descriptions, such as those below.

    models/schema.yml
    version: 2

    models:
    - name: customers
    description: One record per customer
    columns:
    - name: customer_id
    description: Primary key
    tests:
    - unique
    - not_null
    - name: first_order_date
    description: NULL when a customer has not yet placed an order.

    - name: stg_customers
    description: This model cleans up customer data
    columns:
    - name: customer_id
    description: Primary key
    tests:
    - unique
    - not_null

    - name: stg_orders
    description: This model cleans up order data
    columns:
    - name: order_id
    description: Primary key
    tests:
    - unique
    - not_null
    - name: status
    tests:
    - accepted_values:
    values: ['placed', 'shipped', 'completed', 'return_pending', 'returned']

  2. Run dbt docs generate to generate the documentation for your project. dbt introspects your project and your warehouse to generate a JSON file with rich documentation about your project.

  1. Run dbt docs serve command to launch the documentation in a local website.

FAQs

 
 

Next steps

Before moving on from testing, make a change and see how it affects your results:

  • Write a test that fails, for example, omit one of the order statuses in the accepted_values list. What does a failing test look like? Can you debug the failure?
  • Run the tests for one model only. If you grouped your stg_ models into a directory, try running the tests for all the models in that directory.
  • Use a docs block to add a Markdown description to a model.

Commit updated changes

You need to commit the changes you made to the project so that the repository has your latest code.

  1. Add all your changes to git: git add -A
  2. Commit your changes: git commit -m "Add customers model, tests, docs"
  3. Push your changes to your repository: git push
  4. Navigate to your repository, and open a pull request to merge the code into your master branch.

Schedule a job

We recommend using dbt Cloud to schedule a job. For more information about using dbt Core to schedule a job, see dbt airflow blog post or deployments.

0