For the Redshift project in the getting started guide, you'll learn how to set up Redshift and connect it to dbt Cloud.
This guide will walk you through:
- Setting up a Redshift cluster
- Loading training data into your Redshift account
- Connecting dbt Cloud and Redshift
Before beginning this tutorial you will need access to an existing AWS account with permissions to execute a CloudFormation template to create appropriate roles and a Redshift cluster. If you do not have an AWS account, head over to Sign up for AWS.
Let’s get started by accessing your AWS account and setting up Redshift.
- Sign into your AWS account on the AWS sign in page as a root user or IAM user depending on your level of access.
- We will be using a CloudFormation template to quickly set up a Redshift instance. A CloudFormation template is a configuration file that will automatically spin up the necessary resources in AWS. Use the link below to start this process. (source: cloudformation json file)
Choose next for each page until you reach theCloud Formation in Progress
Select acknowledgement checkbox. Check the box for "I acknowledge that AWS CloudFormation might create IAM resources with custom names" and click
Create Stack. You should land on the stack page with a
When the stack status changes to “CREATE_COMPLETE”, click on the
Outputstab on the top to view information that you will use throughout the rest of this guide. Save those credentials for later by keeping this open in a tab.
Type inClick on Redshift
Redshiftto the search bar on the top and click on
Confirm that your new Redshift Cluster is listed under Cluster overview. Click on the cluster name.
Query Data. You can choose the classic query editor or v2. We will be using the v2 version for the purpose of this guide.
You may be asked to Configure account. For the purpose of this sandbox environment, we recommend selecting “Configure account”.
Click on your cluster name in the list and fill out the credentials from the output of the stack.
- User Name:
- Password: choose your own password and save it for later
- Click on
Congrats! You have your Redshift cluster.
Now we are going to load our sample data into the S3 bucket that our Cloudformation template created. S3 buckets are a cheap and simple way to store data outside of Redshift.
- The data used in this course is stored as CSVs in a public S3 bucket. You can use the following URLs to download these files. Download these to your computer to use in the following steps.
- Now we are going to use the S3 bucket that you created via CloudFormation and upload the files. Go to the search bar at the top and type in
S3and click on S3. There will be sample data in the file already, feel free to ignore it or use it for other modeling exploration.
- Click on the
name of the bucketS3 bucket. If you have multiple S3 buckets, this will be the bucket that was listed under “Workshopbucket” on the Outputs page. The bucket will be prefixed with
- Click on
Upload, drag the three files into the UI, and click on
Uploadon the button.
- Save the name of the S3 bucket. It should look like this:
s3://dbt-data-lake-xxxx. You will need it for the next section.
- Now let’s go back to the Redshift query editor. Search for Redshift in the search bar, choose your cluster, and select Query data.
- In your query editor, execute this query below to create the schemas that we will be placing your raw data into. You can highlight the statement and then click on Run to run them individually. If you are on the Classic Query Editor, you might need to input them separately into the UI. You should see these schemas listed under
create schema if not exists jaffle_shop;
create schema if not exists stripe;
- Now create the tables in your schema with these queries using the statements below. These will be populated as tables in the respective schemas.
create table jaffle_shop.customers(
create table jaffle_shop.orders(
_etl_loaded_at timestamp default current_timestamp
create table stripe.payment(
_batched_at timestamp default current_timestamp
- Now we need to copy the data from S3. Be sure to update the S3 location, iam role, and region. You can find the S3 and iam role in your outputs from the Cloudformation stack.
copy jaffle_shop.customers( id, first_name, last_name)
copy jaffle_shop.orders(id, user_id, order_date, status)
copy stripe.payment(id, orderid, paymentmethod, status, amount, created)
Ensure that you can run a select * from each of the tables with the following code snippets.
select * from jaffle_shop.customers;
select * from jaffle_shop.orders;
select * from stripe.payment;
Congratulations! At this point, you have created a Redshift instance and loaded training data. In the next section, we will walk through the next steps to connect dbt Cloud and Redshift.
Connecting to dbt Cloud
Create a dbt Cloud account
Let's start this section by creating a dbt Cloud account if you haven't already.
- Navigate to dbt Cloud.
- If you don't have a dbt Cloud account, create a new one, and verify your account via email.
- If you already have a dbt Cloud account, you can create a new project from your existing account:
- Click the gear icon in the top-right, then click Projects.
- Click + New Project.
- You've arrived at the "Setup a New Project" page.
- Type "Analytics" in the dbt Project Name field. You will be able to rename this project later.
- Click Continue.
Connect dbt Cloud to Redshift
Now let's set up the connection between dbt Cloud and Redshift
Click Redshift to set up your connection.
For the name, write
Redshiftor another simple title.
Enter your Redshift settings. Reference your credentials you saved from the CloudFormation template.
- Your hostname is the entire hostname. Make sure to drop the http:// at the beginning and any trailing slashes at the end.
- The port is
- The database is
Set your development credentials. These credentials will be used by dbt Cloud to connect to Redshift. Those credentials (as provided in your cloudformation output) will be:
- password: this was the password that you set earlier in the guide
- Schema: You’ll notice that the schema name has been auto created for you. By convention, this is
dbt_<first-initial><last-name>. This is the schema connected directly to your development environment, and it's where your models will be built when running dbt within the Cloud IDE.
Click Test Connection at the bottom. This verifies that dbt Cloud can access your Redshift cluster.
If the connection test succeeds, click Next. If it fails, you may need to check your Redshift settings and credentials.
Initialize your repository and start development
When you develop in dbt Cloud, you can leverage Git to version control your code.
To connect to a repository, you can either set up a dbt Cloud-hosted managed repository or directly connect to a supported git provider. Managed repositories are a great way to trial dbt without needing to create a new repository. In the long run, it's better to connect to a supported git provider to use features like automation and continuous integration.
Setting up a managed repository
To set up a managed repository:
- Under "Add repository from", select Managed.
- Type a name for your repo such as
- Click Create. It will take a few seconds for your repository to be created and imported.
- Once you see the "Successfully imported repository," click Continue.
Initialize your dbt project
Now that you have a repository configured, you can initialize your project and start development in dbt Cloud:
- Click Develop from the upper left. It might take a few minutes for your project to spin up for the first time as it establishes your git connection, clones your repo, and tests the connection to the warehouse.
- Above the file tree to the left, click Initialize your project. This builds out your folder structure with example models.
- Make your initial commit by clicking Commit. Use the commit message
initial commit. This creates the first commit to your managed repo and allows you to open a branch where you can add new dbt code.
- Now you should be able to directly query data from your warehouse and execute dbt run. Paste your following warehouse-specific code in the IDE:
select * from `dbt-tutorial.jaffle_shop.customers`
select * from default.jaffle_shop_customers
select * from jaffle_shop.customers
select * from raw.jaffle_shop.customers
- In the command line bar at the bottom, type
dbt runand click Enter. We will explore what happens in the next section of the tutorial.
Congratulations! You have successfully completed the following:
- Set up a Redshift cluster
- Loaded training data into your Redshift account
- Connected dbt Cloud and Redshift
You can continue the tutorial by building your first project. You will learn and apply the fundamentals of dbt with models, tests, docs, and deploying your first job!