Build your first models

With our starter project setup, it's time to get to the fun part — building models! We're going to take the query from the Setting up instructions, and turn it into a model in our dbt project.

Build your first model

dbt Cloud

  1. Ensure you're in the Develop interface. If you're not, click the hamburger menu, and then Develop.
  2. Create a new file in the models directory named models/customers.sql.
  3. Paste the query from the Setting up instructions into the file.
  4. Execute dbt run in the command prompt at the bottom of the screen. You should get a successful run, like so:
A successful run with dbt Cloud
A successful run with dbt Cloud

If you switch back to the BigQuery console you'll be able to select from this model.

dbt CLI

  1. Open your project in a code editor
  2. Create a new SQL file in the models directory, named models/customers.sql.
  3. Paste the query from the Setting up instructions into the file.
  4. From the command line, execute dbt run. Your output should look like this:
A successful run with the dbt CLI
A successful run with the dbt CLI

If you switch back to the BigQuery console you'll be able to select from this model.

FAQs

 How can I see the SQL that dbt is running?
 How did dbt choose which schema to build my models in?
 Do I need to create my target schema before running dbt?
 If I rerun dbt, will there be any downtime as models are rebuilt?
 What happens if the SQL in my query is bad?

Change the way your model is materialized

One of the most powerful features of dbt is that you can change the way a model is materialized in your warehouse, simply by changing a configuration value. Let's see this in action.

  1. Edit the following in your dbt_project.yml file:
models:
jaffle_shop:
materialized: table
example:
materialized: view
  1. Execute dbt run. Your model, customers should now be built as a table!
  1. Edit models/customers.sql to have the following snippet at the top:
{{
config(
materialized='view'
)
}}
with customers as (
select
id as customer_id
...
)
  1. Execute dbt run. Your model, customers should be built as a view. You may need to run dbt run --full-refresh for this to take effect on BigQuery.

FAQs

 What materializations are available in dbt?
 Which materialization should I use for my model?
 What other model configurations are there?

Delete the example models

We don't need the sample files that dbt created for us anymore! Let's delete them.

  1. Delete the models/example/ directory
  2. Delete the example: key from your dbt_project.yml file, and any configurations that are listed under it
# before
models:
jaffle_shop:
materialized: table
example:
materialized: view
# after
models:
jaffle_shop:
materialized: table

FAQs

 How do I remove deleted models from my data warehouse?
 I got an "unused model configurations" error message, what does this mean?

Build models on top of other models

Often, it's a good idea to clean your data in one place, before doing additional transformations downstream. Our query already uses CTEs to this effect, but now we're going to experiment with using the ref function to separate this clean-up into upstream models, like so:

The DAG we want for our dbt project
The DAG we want for our dbt project
  1. Create a new SQL file, models/stg_customers.sql, with the SQL from the customers CTE in our original query:
select
id as customer_id,
first_name,
last_name
from `dbt-tutorial`.jaffle_shop.customers
  1. Create a second new SQL file, models/stg_orders.sql, with the SQL from the orders CTE in our original query:
select
id as order_id,
user_id as customer_id,
order_date,
status
from `dbt-tutorial`.jaffle_shop.orders
  1. Edit the SQL in your models/customers.sql file as follows:
with customers as (
select * from {{ ref('stg_customers') }}
),
orders as (
select * from {{ ref('stg_orders') }}
),
customer_orders as (
select
customer_id,
min(order_date) as first_order_date,
max(order_date) as most_recent_order_date,
count(order_id) as number_of_orders
from orders
group by 1
),
final as (
select
customers.customer_id,
customers.first_name,
customers.last_name,
customer_orders.first_order_date,
customer_orders.most_recent_order_date,
coalesce(customer_orders.number_of_orders, 0) as number_of_orders
from customers
left join customer_orders using (customer_id)
)
select * from final
  1. Execute dbt run

This time when dbt ran, separate views/tables were created for stg_customers, stg_orders and customers. dbt was able to infer the order in which to run these models in — customers depends on stg_customers and stg_orders, so gets built last. There's no need to explicitly define these dependencies.

FAQs

 How do I run one model at a time?
 Do model names need to be unique?
 As I create more models, how should I keep my project organized? What should I name my models?

Extra exercises

  • Write some bad SQL to cause an error — can you debug this error?
  • Run only a single model at a time (docs)
  • Group your models with a stg_ prefix into a staging subdirectory (i.e. models/staging/stg_customers.sql)
    • Configure your staging models to be views
    • Run only the staging models