Skip to main content

Set up and connect BigQuery

Introduction

For the BigQuery project in the getting started guide, you'll learn how to set up BigQuery and connect it to dbt Cloud.

This guide will walk you through:

  • Setting up a new BigQuery instance
  • Accessing sample date in a public data set
  • Connecting dbt Cloud to BigQuery

Prerequisites

Before beginning this guide, make sure that you have access to new or existing Google account. You can use a personal or work account to set up BigQuery through Google Cloud Platform (GCP).

Setting up

Before jumping into the steps below, login to your Google account.

  1. Navigate to the BigQuery Console.

    • If you don't have a Google Cloud Platform account you will be asked to create one.
    • If you do have one (or multiple) it will likely log you into your oldest account. Click your profile picture to the right and verify your are using the correct email account.
  2. Create a new project for this tutorial:

    • If you've just created a BigQuery account, you'll be prompted to create a new project straight away.
    • If you already have an existing organization, you can select the project drop down in the header bar, and create a new project from there.
      Bigquery Project Dropdown

      Bigquery Project Dropdown

  3. Select NEW PROJECT. You should see a project name automatically populate. You can change the name to something more relevant, for example "dbt Learn - Bigquery Setup."

    Bigquery New Project Creation

    Bigquery New Project Creation

  4. Click Create.

Loading data

BigQuery supports public data sets that can be directly queried, so we will show you how to access those datasets via select statements. Additionally, we will show you how to populate your own database objects with that data.

  1. Navigate to the BigQuery Console again. Make sure your new project is selected in the header. If you do not see your account or project, click your profile picture to the right and verify your are using the correct email account.

  2. Copy and paste the below queries into the Query Editor to validate that you are able to run them successfully.

    select * from `dbt-tutorial.jaffle_shop.customers`;
    select * from `dbt-tutorial.jaffle_shop.orders`;
    select * from `dbt-tutorial.stripe.payment`;
  3. Verify you can see an output:

    Bigquery Query Results

    Bigquery Query Results

  4. Create datasets. Datasets in BigQuery are equivalent to schemas in a traditional database.

    1. Find your project in the picker. Click the three dots to expose options.
    2. Click Create dataset.
      Bigquery Create Dataset

      Bigquery Create Dataset

    3. Fill in Dataset ID as required. This will be used like schema in fully qualified references to your database objects, i.e. database.schema.table, so choose a name that fits the purpose, in this case we will be creating one now for jaffle_shop and one for stripe later.
      Bigquery Create Dataset ID

      Bigquery Create Dataset ID

    4. Leave the default values in the rest of the fields:
      • Data location can be left blank -- if selected, this determines the GCP location where your data is stored. The current default location is the US multi-region. All tables within this dataset will share this location.
      • Even though it is unchecked, billing table expiration will be set automatically to 60 days, because billing has not been enabled for this project, so GCP defaults to deprecating tables.
      • Allow Google to manage encryption.
      • Click CREATE DATASET.
      • Repeat steps i through iv for the second dataset, stripe.

Connecting to dbt Cloud

You will learn how to connect dbt Cloud to Google BigQuery so that you can leverage the power of dbt to transform data in BigQuery.

Generate BigQuery credentials

In order to let dbt connect to your warehouse, you'll need to generate a keyfile. This is analogous to using a database user name and password with most other data warehouses.

  1. Go to the BigQuery credential wizard. Make sure your new project is selected in the header. If you do not see your account or project, click your profile picture to the right and verify your are using the correct email account.
  2. Select + Create Credentials then select Service account.
  3. Type dbt-user in the Service account name field, then click Create and Continue.
  4. Type and select BigQuery Admin in the Role field.
  5. Click Continue.
  6. Leave fields blank in the "Grant users access to this service account" section and click Done.
  7. Click the service account that you just created.
  8. Select Keys.
  9. Click Add Key then select Create new key.
  10. Select JSON as the key type then click Create.
  11. You should be prompted to download the JSON file. Save it locally to an easy-to-remember spot, with a clear filename. For example, dbt-user-creds.json.

Create a dbt Cloud account

Let's start this section by creating a dbt Cloud account if you haven't already.

  1. Navigate to dbt Cloud.
  2. If you don't have a dbt account, create a new one and verify your account via email.
  3. If you have a dbt Cloud account, you can create a new project from your existing account:
    1. Click hamburger icon, then click Account Settings.
    2. Click New project, then click Begin.
  4. You can now use the Set Up a New Project workflow to set up your first dbt Cloud project!
  5. Type "Analytics" in the dbt Project Name field. You will be able to rename this project later.
  6. Click Continue.

Connect dbt Cloud to BigQuery

Now let's set up the connection between dbt Cloud and BigQuery.

  1. Click BigQuery to set up your connection.
  2. Click Upload a Service Account JSON File in BigQuery settings.
  3. Select the JSON file you downloaded in Generate BigQuery Credentials. dbt Cloud will fill in all the necessary fields.
  4. Click Test at the top. This verifies that dbt Cloud can access your BigQuery account.
  5. If you see "Connection test Succeeded!" then click Continue. If it fails, you might need to go back and regenerate your BigQuery credentials.

Initialize your repository and start development

Setting up a managed repository

dbt Cloud uses Git for version control, but using a managed repository makes this easier. To set up a managed repository:

  1. Under "Add repository from", select Managed.
  2. Type a name for your repo such as bbaggins-dbt-quickstart
  3. Click Create. It will take a few seconds for your repository to be created and imported.
  4. Once you see the "Successfully imported repository," click Continue.

Initialize your dbt project

Now that you have a repository configured, you can initialize your project and start development in the dbt Cloud IDE:

  1. Click Start Developing. You can also navigate to the Develop page from hamburger menu in the top left. It might take a few minutes for your project to spin up for the first time, as it established your git connection, clones your repo, and tests the connection to the warehouse.
  2. Above the file tree to the left, click Initialize your project. This builds out your folder structure with example models.
  3. Make your initial commit by clicking Commit. Use the commit message initial commit. This creates the first commit to your managed repo and allows you to open a branch where you can add new dbt code.
  4. Now you should be able to directly query data from your warehouse and execute dbt run. You can try this out now:
    • In "Scratchpad 1", delete all text and paste your warehouse-specific code into Scratchpad 1:
select * from `dbt-tutorial.jaffle_shop.customers`
  • In the command line bar at the bottom, type dbt run and click Enter. We will explore what happens in the next section of the tutorial.

Congratulations! You have successfully completed the following:

  • Set up a new BigQuery instance
  • Accessing sample date in a public data set
  • Connected dbt Cloud to BigQuery

Next steps

You can Continue the tutorial by building your first project. You will learn and apply the fundamentals of dbt with models, tests, docs, and deploying your first job!