Skip to main content

Set up and connect Redshift


For the Redshift project in the getting started guide, you'll learn how to set up Redshift and connect it to dbt Cloud.

This guide will walk you through:

  • Setting up a Redshift cluster
  • Loading training data into your Redshift account
  • Connecting dbt Cloud and Redshift


Before beginning this tutorial you will need access to an existing AWS account with permissions to execute a CloudFormation template to create appropriate roles and a Redshift cluster. If you do not have an AWS account, head over to Sign up for AWS.

Setting up

Let’s get started by accessing your AWS account and setting up Redshift.

  1. Sign into your AWS account on the AWS sign in page as a root user or IAM user depending on your level of access.
  2. We will be using a CloudFormation template to quickly set up a Redshift instance. A CloudFormation template is a configuration file that will automatically spin up the necessary resources in AWS. Use the link below to start this process. (source: cloudformation json file)

Start CloudFormation Stack

  1. Choose next for each page until you reach the Select acknowledgement checkbox. Check the box for "I acknowledge that AWS CloudFormation might create IAM resources with custom names" and click Create Stack. You should land on the stack page with a CREATE_IN_PROGRESS status.

    Cloud Formation in ProgressCloud Formation in Progress
  2. When the stack status changes to “CREATE_COMPLETE”, click on the Outputs tab on the top to view information that you will use throughout the rest of this guide. Save those credentials for later by keeping this open in a tab.

  3. Type in Redshift to the search bar on the top and click on Amazon Redshift

    Click on RedshiftClick on Redshift
  4. Confirm that your new Redshift Cluster is listed under Cluster overview. Click on the cluster name.

Available Redshift ClusterAvailable Redshift Cluster
  1. Click on Query Data. You can choose the classic query editor or v2. We will be using the v2 version for the purpose of this guide.

  2. You may be asked to Configure account. For the purpose of this sandbox environment, we recommend selecting “Configure account”.

  3. Click on your cluster name in the list and fill out the credentials from the output of the stack.

  • Database: dbtworkshop
  • User Name: dbtadmin
  • Password: choose your own password and save it for later
Redshift Query Editor v2Redshift Query Editor v2Connect to Redshift ClusterConnect to Redshift Cluster
  1. Click on Create Connection.

Congrats! You have your Redshift cluster.

Loading data

Now we are going to load our sample data into the S3 bucket that our Cloudformation template created. S3 buckets are a cheap and simple way to store data outside of Redshift.

  1. The data used in this course is stored as CSVs in a public S3 bucket. You can use the following URLs to download these files. Download these to your computer to use in the following steps.
  1. Now we are going to use the S3 bucket that you created via CloudFormation and upload the files. Go to the search bar at the top and type in S3 and click on S3. There will be sample data in the file already, feel free to ignore it or use it for other modeling exploration.
Go to S3Go to S3
  1. Click on the name of the bucket S3 bucket. If you have multiple S3 buckets, this will be the bucket that was listed under “Workshopbucket” on the Outputs page. The bucket will be prefixed with dbt-data-lake.
Go to your S3 BucketGo to your S3 Bucket
  1. Click on Upload, drag the three files into the UI, and click on Upload on the button.
Upload your CSVsUpload your CSVs
  1. Save the name of the S3 bucket. It should look like this: s3://dbt-data-lake-xxxx. You will need it for the next section.
  2. Now let’s go back to the Redshift query editor. Search for Redshift in the search bar, choose your cluster, and select Query data.
  3. In your query editor, execute this query below to create the schemas that we will be placing your raw data into. You can highlight the statement and then click on Run to run them individually. If you are on the Classic Query Editor, you might need to input them separately into the UI. You should see these schemas listed under dbtworkshop.
create schema if not exists jaffle_shop;
create schema if not exists stripe;
  1. Now create the tables in your schema with these queries using the statements below. These will be populated as tables in the respective schemas.
create table jaffle_shop.customers(
id integer,
first_name varchar(50),
last_name varchar(50)

create table jaffle_shop.orders(
id integer,
user_id integer,
order_date date,
status varchar(50),
_etl_loaded_at timestamp default current_timestamp

create table stripe.payment(
id integer,
orderid integer,
paymentmethod varchar(50),
status varchar(50),
amount integer,
created date,
_batched_at timestamp default current_timestamp
  1. Now we need to copy the data from S3. Be sure to update the S3 location, iam role, and region. You can find the S3 and iam role in your outputs from the Cloudformation stack.
copy jaffle_shop.customers( id, first_name, last_name)
from 's3://dbt-data-lake-xxxx/jaffle_shop_customers.csv'
iam_role 'arn:aws:iam::XXXXXXXXXX:role/RoleName'
region 'us-east-1'
delimiter ','
ignoreheader 1

copy jaffle_shop.orders(id, user_id, order_date, status)
from 's3://dbt-data-lake-xxxx/jaffle_shop_orders.csv'
iam_role 'arn:aws:iam::XXXXXXXXXX:role/RoleName'
region 'us-east-1'
delimiter ','
ignoreheader 1

copy stripe.payment(id, orderid, paymentmethod, status, amount, created)
from 's3://dbt-data-lake-xxxx/stripe_payments.csv'
iam_role 'arn:aws:iam::XXXXXXXXXX:role/RoleName'
region 'us-east-1'
delimiter ','
ignoreheader 1

Ensure that you can run a select * from each of the tables with the following code snippets.

select * from jaffle_shop.customers;
select * from jaffle_shop.orders;
select * from stripe.payment;

Congratulations! At this point, you have created a Redshift instance and loaded training data. In the next section, we will walk through the next steps to connect dbt Cloud and Redshift.

Connecting to dbt Cloud

Create a dbt Cloud account

Let's start this section by creating a dbt Cloud account if you haven't already.

  1. Navigate to dbt Cloud.
  2. If you don't have a dbt Cloud account, create a new one, and verify your account via email.
  3. If you already have a dbt Cloud account, you can create a new project from your existing account:
    1. Click the gear icon in the top-right, then click Projects.
    2. Click + New Project.
  4. You've arrived at the "Setup a New Project" page.
  5. Type "Analytics" in the dbt Project Name field. You will be able to rename this project later.
  6. Click Continue.

Connect dbt Cloud to Redshift

Now let's set up the connection between dbt Cloud and Redshift

  1. Click Redshift to set up your connection.

    dbt Cloud - Choose Redshift Connectiondbt Cloud - Choose Redshift Connection
  2. For the name, write Redshift or another simple title.

  3. Enter your Redshift settings. Reference your credentials you saved from the CloudFormation template.

    • Your hostname is the entire hostname. Make sure to drop the http:// at the beginning and any trailing slashes at the end.
    • The port is 5439
    • The database is dbtworkshop.
      dbt Cloud - Redshift Cluster Settingsdbt Cloud - Redshift Cluster Settings
  4. Set your development credentials. These credentials will be used by dbt Cloud to connect to Redshift. Those credentials (as provided in your cloudformation output) will be:

    • Username: dbtadmin
    • password: this was the password that you set earlier in the guide
    • Schema: You’ll notice that the schema name has been auto created for you. By convention, this is dbt_<first-initial><last-name>. This is the schema connected directly to your development environment, and it's where your models will be built when running dbt within the Cloud IDE.
      dbt Cloud - Redshift Development Credentialsdbt Cloud - Redshift Development Credentials
  5. Click Test Connection at the bottom. This verifies that dbt Cloud can access your Redshift cluster.

  6. If the connection test succeeds, click Next. If it fails, you may need to check your Redshift settings and credentials.

Initialize your repository and start development

When you develop in dbt Cloud, you can leverage Git to version control your code.

To connect to a repository, you can either set up a dbt Cloud-hosted managed repository or directly connect to a supported git provider. Managed repositories are a great way to trial dbt without needing to create a new repository. In the long run, it's better to connect to a supported git provider to use features like automation and continuous integration.

Setting up a managed repository

To set up a managed repository:

  1. Under "Add repository from", select Managed.
  2. Type a name for your repo such as bbaggins-dbt-quickstart
  3. Click Create. It will take a few seconds for your repository to be created and imported.
  4. Once you see the "Successfully imported repository," click Continue.

Initialize your dbt project

Now that you have a repository configured, you can initialize your project and start development in dbt Cloud:

  1. Click Develop from the upper left. It might take a few minutes for your project to spin up for the first time as it establishes your git connection, clones your repo, and tests the connection to the warehouse.
  2. Above the file tree to the left, click Initialize your project. This builds out your folder structure with example models.
  3. Make your initial commit by clicking Commit. Use the commit message initial commit. This creates the first commit to your managed repo and allows you to open a branch where you can add new dbt code.
  4. Now you should be able to directly query data from your warehouse and execute dbt run. Paste your following warehouse-specific code in the IDE:
select * from `dbt-tutorial.jaffle_shop.customers`
  • In the command line bar at the bottom, type dbt run and click Enter. We will explore what happens in the next section of the tutorial.

Congratulations! You have successfully completed the following:

  • Set up a Redshift cluster
  • Loaded training data into your Redshift account
  • Connected dbt Cloud and Redshift

Next steps

You can continue the tutorial by building your first project. You will learn and apply the fundamentals of dbt with models, tests, docs, and deploying your first job!