Quickstart for dbt Cloud and Snowflake
Introduction
In this quickstart guide, you'll learn how to use dbt Cloud with Snowflake. It will show you how to:
- Create a new Snowflake worksheet.
- Load sample data into your Snowflake account.
- Connect dbt Cloud to Snowflake.
- Take a sample query and turn it into a model in your dbt project. A model in dbt is a select statement.
- Add sources to your dbt project. Sources allow you to name and describe the raw data already loaded into Snowflake.
- Add tests to your models.
- Document your models.
- Schedule a job to run.
Snowflake also provides a quickstart for you to learn how to use dbt Cloud. It makes use of a different public dataset (Knoema Economy Data Atlas) than what's shown in this guide. For more information, refer to Accelerating Data Teams with dbt Cloud & Snowflake in the Snowflake docs.
You can check out dbt Fundamentals for free if you're interested in course learning with videos.
You can also watch the YouTube video on dbt and Snowflake.
Prerequisites
- You have a dbt Cloud account.
- You have a trial Snowflake account. During trial account creation, make sure to choose the Enterprise Snowflake edition so you have
ACCOUNTADMIN
access. For a full implementation, you should consider organizational questions when choosing a cloud provider. For more information, see Introduction to Cloud Platforms in the Snowflake docs. For the purposes of this setup, all cloud providers and regions will work so choose whichever you’d like.
Related content
- Learn more with dbt Learn courses
- How we configure Snowflake
- CI jobs
- Deploy jobs
- Job notifications
- Source freshness
Create a new Snowflake worksheet
- Log in to your trial Snowflake account.
- In the Snowflake UI, click + Worksheet in the upper right corner to create a new worksheet.
Load data
The data used here is stored as CSV files in a public S3 bucket and the following steps will guide you through how to prepare your Snowflake account for that data and upload it.
-
Create a new virtual warehouse, two new databases (one for raw data, the other for future dbt development), and two new schemas (one for
jaffle_shop
data, the other forstripe
data).To do this, run these SQL commands by typing them into the Editor of your new Snowflake worksheet and clicking Run in the upper right corner of the UI:
create warehouse transforming;
create database raw;
create database analytics;
create schema raw.jaffle_shop;
create schema raw.stripe; -
In the
raw
database andjaffle_shop
andstripe
schemas, create three tables and load relevant data into them:-
First, delete all contents (empty) in the Editor of the Snowflake worksheet. Then, run this SQL command to create the
customer
table:create table raw.jaffle_shop.customers
( id integer,
first_name varchar,
last_name varchar
); -
Delete all contents in the Editor, then run this command to load data into the
customer
table:copy into raw.jaffle_shop.customers (id, first_name, last_name)
from 's3://dbt-tutorial-public/jaffle_shop_customers.csv'
file_format = (
type = 'CSV'
field_delimiter = ','
skip_header = 1
); -
Delete all contents in the Editor (empty), then run this command to create the
orders
table:create table raw.jaffle_shop.orders
( id integer,
user_id integer,
order_date date,
status varchar,
_etl_loaded_at timestamp default current_timestamp
); -
Delete all contents in the Editor, then run this command to load data into the
orders
table:copy into raw.jaffle_shop.orders (id, user_id, order_date, status)
from 's3://dbt-tutorial-public/jaffle_shop_orders.csv'
file_format = (
type = 'CSV'
field_delimiter = ','
skip_header = 1
); -
Delete all contents in the Editor (empty), then run this command to create the
payment
table:create table raw.stripe.payment
( id integer,
orderid integer,
paymentmethod varchar,
status varchar,
amount integer,
created date,
_batched_at timestamp default current_timestamp
); -
Delete all contents in the Editor, then run this command to load data into the
payment
table:copy into raw.stripe.payment (id, orderid, paymentmethod, status, amount, created)
from 's3://dbt-tutorial-public/stripe_payments.csv'
file_format = (
type = 'CSV'
field_delimiter = ','
skip_header = 1
);
-
-
Verify that the data is loaded by running these SQL queries. Confirm that you can see output for each one.
select * from raw.jaffle_shop.customers;
select * from raw.jaffle_shop.orders;
select * from raw.stripe.payment;
Connect dbt Cloud to Snowflake
There are two ways to connect dbt Cloud to Snowflake. The first option is Partner Connect, which provides a streamlined setup to create your dbt Cloud account from within your new Snowflake trial account. The second option is to create your dbt Cloud account separately and build the Snowflake connection yourself (connect manually). If you want to get started quickly, dbt Labs recommends using Partner Connect. If you want to customize your setup from the very beginning and gain familiarity with the dbt Cloud setup flow, dbt Labs recommends connecting manually.
- Use Partner Connect
- Connect manually
Using Partner Connect allows you to create a complete dbt account with your Snowflake connection, a managed repository, environments, and credentials.
-
In the Snowflake UI, click on the home icon in the upper left corner. In the left sidebar, select Data Products. Then, select Partner Connect. Find the dbt tile by scrolling or by searching for dbt in the search bar. Click the tile to connect to dbt.
If you’re using the classic version of the Snowflake UI, you can click the Partner Connect button in the top bar of your account. From there, click on the dbt tile to open up the connect box.
-
In the Connect to dbt popup, find the Optional Grant option and select the RAW and ANALYTICS databases. This will grant access for your new dbt user role to each database. Then, click Connect.
-
Click Activate when a popup appears:
- After the new tab loads, you will see a form. If you already created a dbt Cloud account, you will be asked to provide an account name. If you haven't created account, you will be asked to provide an account name and password.
-
After you have filled out the form and clicked Complete Registration, you will be logged into dbt Cloud automatically.
-
Go to the left side menu and click your account name, then select Account settings, choose the "Partner Connect Trial" project, and select snowflake in the overview table. Select edit and update the fields Database and Warehouse to be
analytics
andtransforming
, respectively.
-
Create a new project in dbt Cloud. Navigate to Account settings (by clicking on your account name in the left side menu), and click + New Project.
-
Enter a project name and click Continue.
-
For the warehouse, click Snowflake then Next to set up your connection.
-
Enter your Settings for Snowflake with:
-
Account — Find your account by using the Snowflake trial account URL and removing
snowflakecomputing.com
. The order of your account information will vary by Snowflake version. For example, Snowflake's Classic console URL might look like:oq65696.west-us-2.azure.snowflakecomputing.com
. The AppUI or Snowsight URL might look more like:snowflakecomputing.com/west-us-2.azure/oq65696
. In both examples, your account will be:oq65696.west-us-2.azure
. For more information, see Account Identifiers in the Snowflake docs.✅ db5261993
ordb5261993.east-us-2.azure
❌ db5261993.eu-central-1.snowflakecomputing.com
-
Role — Leave blank for now. You can update this to a default Snowflake role later.
-
Database —
analytics
. This tells dbt to create new models in the analytics database. -
Warehouse —
transforming
. This tells dbt to use the transforming warehouse that was created earlier.
-
-
Enter your Development Credentials for Snowflake with:
- Username — The username you created for Snowflake. The username is not your email address and is usually your first and last name together in one word.
- Password — The password you set when creating your Snowflake account.
- Schema — You’ll notice that the schema name has been auto created for you. By convention, this is
dbt_<first-initial><last-name>
. This is the schema connected directly to your development environment, and it's where your models will be built when running dbt within the Cloud IDE. - Target name — Leave as the default.
- Threads — Leave as 4. This is the number of simultaneous connects that dbt Cloud will make to build models concurrently.
-
Click Test Connection. This verifies that dbt Cloud can access your Snowflake account.
-
If the connection test succeeds, click Next. If it fails, you may need to check your Snowflake settings and credentials.
Set up a dbt Cloud managed repository
If you used Partner Connect, you can skip to initializing your dbt project as the Partner Connect provides you with a managed repository. Otherwise, you will need to create your repository connection.
When you develop in dbt Cloud, you can leverage Git to version control your code.
To connect to a repository, you can either set up a dbt Cloud-hosted managed repository or directly connect to a supported git provider. Managed repositories are a great way to trial dbt without needing to create a new repository. In the long run, it's better to connect to a supported git provider to use features like automation and continuous integration.
To set up a managed repository:
- Under "Setup a repository", select Managed.
- Type a name for your repo such as
bbaggins-dbt-quickstart
- Click Create. It will take a few seconds for your repository to be created and imported.
- Once you see the "Successfully imported repository," click Continue.
Initialize your dbt project and start developing
Now that you have a repository configured, you can initialize your project and start development in dbt Cloud:
- Click Start developing in the IDE. It might take a few minutes for your project to spin up for the first time as it establishes your git connection, clones your repo, and tests the connection to the warehouse.
- Above the file tree to the left, click Initialize your project. This builds out your folder structure with example models.
- Make your initial commit by clicking Commit and sync. Use the commit message
initial commit
. This creates the first commit to your managed repo and allows you to open a branch where you can add new dbt code. - You can now directly query data from your warehouse and execute
dbt run
. You can try this out now:- Click + Create new file, add this query to the new file, and click Save as to save the new file:
select * from raw.jaffle_shop.customers
- In the command line bar at the bottom, enter
dbt run
and click Enter. You should see adbt run succeeded
message.
- Click + Create new file, add this query to the new file, and click Save as to save the new file:
Build your first model
You have two options for working with files in the dbt Cloud IDE:
- Create a new branch (recommended) — Create a new branch to edit and commit your changes. Navigate to Version Control on the left sidebar and click Create branch.
- Edit in the protected primary branch — If you prefer to edit, format, or lint files and execute dbt commands directly in your primary git branch. The dbt Cloud IDE prevents commits to the protected branch, so you will be prompted to commit your changes to a new branch.
Name the new branch add-customers-model
.
- Click the ... next to the
models
directory, then select Create file. - Name the file
customers.sql
, then click Create. - Copy the following query into the file and click Save.
with customers as (
select
id as customer_id,
first_name,
last_name
from raw.jaffle_shop.customers
),
orders as (
select
id as order_id,
user_id as customer_id,
order_date,
status
from raw.jaffle_shop.orders
),
customer_orders as (
select
customer_id,
min(order_date) as first_order_date,
max(order_date) as most_recent_order_date,
count(order_id) as number_of_orders
from orders
group by 1
),
final as (
select
customers.customer_id,
customers.first_name,
customers.last_name,
customer_orders.first_order_date,
customer_orders.most_recent_order_date,
coalesce(customer_orders.number_of_orders, 0) as number_of_orders
from customers
left join customer_orders using (customer_id)
)
select * from final
- Enter
dbt run
in the command prompt at the bottom of the screen. You should get a successful run and see the three models.
Later, you can connect your business intelligence (BI) tools to these views and tables so they only read cleaned up data rather than raw data in your BI tool.
Change the way your model is materialized
One of the most powerful features of dbt is that you can change the way a model is materialized in your warehouse, simply by changing a configuration value. You can change things between tables and views by changing a keyword rather than writing the data definition language (DDL) to do this behind the scenes.
By default, everything gets created as a view. You can override that at the directory level so everything in that directory will materialize to a different materialization.
-
Edit your
dbt_project.yml
file.-
Update your project
name
to:dbt_project.ymlname: 'jaffle_shop'
-
Configure
jaffle_shop
so everything in it will be materialized as a table; and configureexample
so everything in it will be materialized as a view. Update yourmodels
config block to:dbt_project.ymlmodels:
jaffle_shop:
+materialized: table
example:
+materialized: view -
Click Save.
-
-
Enter the
dbt run
command. Yourcustomers
model should now be built as a table!infoTo do this, dbt had to first run a
drop view
statement (or API call on BigQuery), then acreate table as
statement. -
Edit
models/customers.sql
to override thedbt_project.yml
for thecustomers
model only by adding the following snippet to the top, and click Save:models/customers.sql{{
config(
materialized='view'
)
}}
with customers as (
select
id as customer_id
...
) -
Enter the
dbt run
command. Your model,customers
, should now build as a view.- BigQuery users need to run
dbt run --full-refresh
instead ofdbt run
to full apply materialization changes.
- BigQuery users need to run
-
Enter the
dbt run --full-refresh
command for this to take effect in your warehouse.
FAQs
Delete the example models
You can now delete the files that dbt created when you initialized the project:
-
Delete the
models/example/
directory. -
Delete the
example:
key from yourdbt_project.yml
file, and any configurations that are listed under it.dbt_project.yml# before
models:
jaffle_shop:
+materialized: table
example:
+materialized: viewdbt_project.yml# after
models:
jaffle_shop:
+materialized: table -
Save your changes.
FAQs
Build models on top of other models
As a best practice in SQL, you should separate logic that cleans up your data from logic that transforms your data. You have already started doing this in the existing query by using common table expressions (CTEs).
Now you can experiment by separating the logic out into separate models and using the ref function to build models on top of other models:
-
Create a new SQL file,
models/stg_customers.sql
, with the SQL from thecustomers
CTE in our original query. -
Create a second new SQL file,
models/stg_orders.sql
, with the SQL from theorders
CTE in our original query.models/stg_customers.sqlselect
id as customer_id,
first_name,
last_name
from raw.jaffle_shop.customersmodels/stg_orders.sqlselect
id as order_id,
user_id as customer_id,
order_date,
status
from raw.jaffle_shop.orders -
Edit the SQL in your
models/customers.sql
file as follows:models/customers.sqlwith customers as (
select * from {{ ref('stg_customers') }}
),
orders as (
select * from {{ ref('stg_orders') }}
),
customer_orders as (
select
customer_id,
min(order_date) as first_order_date,
max(order_date) as most_recent_order_date,
count(order_id) as number_of_orders
from orders
group by 1
),
final as (
select
customers.customer_id,
customers.first_name,
customers.last_name,
customer_orders.first_order_date,
customer_orders.most_recent_order_date,
coalesce(customer_orders.number_of_orders, 0) as number_of_orders
from customers
left join customer_orders using (customer_id)
)
select * from final -
Execute
dbt run
.This time, when you performed a
dbt run
, separate views/tables were created forstg_customers
,stg_orders
andcustomers
. dbt inferred the order to run these models. Becausecustomers
depends onstg_customers
andstg_orders
, dbt buildscustomers
last. You do not need to explicitly define these dependencies.
FAQs
Build models on top of sources
Sources make it possible to name and describe the data loaded into your warehouse by your extract and load tools. By declaring these tables as sources in dbt, you can:
- select from source tables in your models using the
{{ source() }}
function, helping define the lineage of your data - test your assumptions about your source data
- calculate the freshness of your source data
-
Create a new YML file
models/sources.yml
. -
Declare the sources by copying the following into the file and clicking Save.
models/sources.ymlversion: 2
sources:
- name: jaffle_shop
description: This is a replica of the Postgres database used by our app
database: raw
schema: jaffle_shop
tables:
- name: customers
description: One record per customer.
- name: orders
description: One record per order. Includes cancelled and deleted orders. -
Edit the
models/stg_customers.sql
file to select from thecustomers
table in thejaffle_shop
source.models/stg_customers.sqlselect
id as customer_id,
first_name,
last_name
from {{ source('jaffle_shop', 'customers') }} -
Edit the
models/stg_orders.sql
file to select from theorders
table in thejaffle_shop
source.models/stg_orders.sqlselect
id as order_id,
user_id as customer_id,
order_date,
status
from {{ source('jaffle_shop', 'orders') }} -
Execute
dbt run
.The results of your
dbt run
will be exactly the same as the previous step. Yourstg_customers
andstg_orders
models will still query from the same raw data source in Snowflake. By usingsource
, you can test and document your raw data and also understand the lineage of your sources.
Add tests to your models
Adding tests to a project helps validate that your models are working correctly.
To add tests to your project:
-
Create a new YAML file in the
models
directory, namedmodels/schema.yml
-
Add the following contents to the file:
models/schema.ymlversion: 2
models:
- name: customers
columns:
- name: customer_id
tests:
- unique
- not_null
- name: stg_customers
columns:
- name: customer_id
tests:
- unique
- not_null
- name: stg_orders
columns:
- name: order_id
tests:
- unique
- not_null
- name: status
tests:
- accepted_values:
values: ['placed', 'shipped', 'completed', 'return_pending', 'returned']
- name: customer_id
tests:
- not_null
- relationships:
to: ref('stg_customers')
field: customer_id -
Run
dbt test
, and confirm that all your tests passed.
When you run dbt test
, dbt iterates through your YAML files, and constructs a query for each test. Each query will return the number of records that fail the test. If this number is 0, then the test is successful.
FAQs
Document your models
Adding documentation to your project allows you to describe your models in rich detail, and share that information with your team. Here, we're going to add some basic documentation to our project.
-
Update your
models/schema.yml
file to include some descriptions, such as those below.models/schema.ymlversion: 2
models:
- name: customers
description: One record per customer
columns:
- name: customer_id
description: Primary key
tests:
- unique
- not_null
- name: first_order_date
description: NULL when a customer has not yet placed an order.
- name: stg_customers
description: This model cleans up customer data
columns:
- name: customer_id
description: Primary key
tests:
- unique
- not_null
- name: stg_orders
description: This model cleans up order data
columns:
- name: order_id
description: Primary key
tests:
- unique
- not_null
- name: status
tests:
- accepted_values:
values: ['placed', 'shipped', 'completed', 'return_pending', 'returned']
- name: customer_id
tests:
- not_null
- relationships:
to: ref('stg_customers')
field: customer_id -
Run
dbt docs generate
to generate the documentation for your project. dbt introspects your project and your warehouse to generate a JSON file with rich documentation about your project.
- Click the book icon in the Develop interface to launch documentation in a new tab.
FAQs
Commit your changes
Now that you've built your customer model, you need to commit the changes you made to the project so that the repository has your latest code.
If you edited directly in the protected primary branch:
- Click the Commit and sync git button. This action prepares your changes for commit.
- A modal titled Commit to a new branch will appear.
- In the modal window, name your new branch
add-customers-model
. This branches off from your primary branch with your new changes. - Add a commit message, such as "Add customers model, tests, docs" and and commit your changes.
- Click Merge this branch to main to add these changes to the main branch on your repo.
If you created a new branch before editing:
- Since you already branched out of the primary protected branch, go to Version Control on the left.
- Click Commit and sync to add a message.
- Add a commit message, such as "Add customers model, tests, docs."
- Click Merge this branch to main to add these changes to the main branch on your repo.
Deploy dbt
Use dbt Cloud's Scheduler to deploy your production jobs confidently and build observability into your processes. You'll learn to create a deployment environment and run a job in the following steps.
Create a deployment environment
- In the upper left, select Deploy, then click Environments.
- Click Create Environment.
- In the Name field, write the name of your deployment environment. For example, "Production."
- In the dbt Version field, select the latest version from the dropdown.
- Under Deployment connection, enter the name of the dataset you want to use as the target, such as "Analytics". This will allow dbt to build and work with that dataset. For some data warehouses, the target dataset may be referred to as a "schema".
- Click Save.
Create and run a job
Jobs are a set of dbt commands that you want to run on a schedule. For example, dbt build
.
As the jaffle_shop
business gains more customers, and those customers create more orders, you will see more records added to your source data. Because you materialized the customers
model as a table, you'll need to periodically rebuild your table to ensure that the data stays up-to-date. This update will happen when you run a job.
- After creating your deployment environment, you should be directed to the page for a new environment. If not, select Deploy in the upper left, then click Jobs.
- Click Create one and provide a name, for example, "Production run", and link to the Environment you just created.
- Scroll down to the Execution Settings section.
- Under Commands, add this command as part of your job if you don't see it:
dbt build
- Select the Generate docs on run checkbox to automatically generate updated project docs each time your job runs.
- For this exercise, do not set a schedule for your project to run — while your organization's project should run regularly, there's no need to run this example project on a schedule. Scheduling a job is sometimes referred to as deploying a project.
- Select Save, then click Run now to run your job.
- Click the run and watch its progress under "Run history."
- Once the run is complete, click View Documentation to see the docs for your project.
Congratulations 🎉! You've just deployed your first dbt project!