BigQuery setup

profiles.yml file is for dbt Core users only

If you're using dbt Cloud, you don't need to create a profiles.yml file. This file is only for dbt Core users. To connect your data platform to dbt Cloud, refer to About data platforms.

Maintained by: dbt Labs
Authors: core dbt maintainers
GitHub repo: dbt-labs/dbt-adapters
PyPI package: dbt-bigquery
Slack channel: #db-bigquery
Supported dbt Core version: v0.10.0 and newer
dbt Cloud support: Supported
Minimum data platform version: n/a

Installing dbt-bigquery

Use pip to install the adapter. Before 1.8, installing the adapter would automatically install dbt-core and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install dbt-core. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation:

python -m pip install dbt-core dbt-bigquery

Configuring dbt-bigquery

For BigQuery-specific configuration, please refer to BigQuery configs.

Authentication Methods

BigQuery targets can be specified using one of four methods:

OAuth via gcloud
OAuth token-based
service account file
service account json

For local development, we recommend using the OAuth method. If you're scheduling dbt on a server, you should use the service account auth method instead.

BigQuery targets should be set up using the following configuration in your profiles.yml file. There are a number of optional configurations you may specify as well.

OAuth via gcloud

This connection method requires local OAuth via gcloud.

~/.dbt/profiles.yml

# Note that only one of these targets is required

my-bigquery-db:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth
      project: GCP_PROJECT_ID
      dataset: DBT_DATASET_NAME # You can also use "schema" here
      threads: 4 # Must be a value of 1 or greater 
      OPTIONAL_CONFIG: VALUE

Default project

If you do not specify a project/database and are using the oauth method, dbt will use the default project associated with your user, as defined by gcloud config set.

OAuth Token-Based

See docs on using OAuth 2.0 to access Google APIs.

Refresh token

Using the refresh token and client information, dbt will mint new access tokens as necessary.

~/.dbt/profiles.yml

my-bigquery-db:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth-secrets
      project: GCP_PROJECT_ID
      dataset: DBT_DATASET_NAME # You can also use "schema" here
      threads: 4 # Must be a value of 1 or greater
      refresh_token: TOKEN
      client_id: CLIENT_ID
      client_secret: CLIENT_SECRET
      token_uri: REDIRECT_URI
      OPTIONAL_CONFIG: VALUE

Temporary token

dbt will use the one-time access token, no questions asked. This approach makes sense if you have an external deployment process that can mint new access tokens and update the profile file accordingly.

~/.dbt/profiles.yml

my-bigquery-db:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth-secrets
      project: GCP_PROJECT_ID
      dataset: DBT_DATASET_NAME # You can also use "schema" here
      threads: 4 # Must be a value of 1 or greater
      token: TEMPORARY_ACCESS_TOKEN # refreshed + updated by external process
      OPTIONAL_CONFIG: VALUE

Service Account File

~/.dbt/profiles.yml

my-bigquery-db:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: service-account
      project: GCP_PROJECT_ID
      dataset: DBT_DATASET_NAME
      threads: 4 # Must be a value of 1 or greater
      keyfile: /PATH/TO/BIGQUERY/keyfile.json
      OPTIONAL_CONFIG: VALUE

Service Account JSON

Note

This authentication method is only recommended for production environments where using a Service Account Keyfile is impractical.

~/.dbt/profiles.yml

my-bigquery-db:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: service-account-json
      project: GCP_PROJECT_ID
      dataset: DBT_DATASET_NAME
      threads: 4 # Must be a value of 1 or greater
      OPTIONAL_CONFIG: VALUE

      # These fields come from the service account json keyfile
      keyfile_json:
        type: xxx
        project_id: xxx
        private_key_id: xxx
        private_key: xxx
        client_email: xxx
        client_id: xxx
        auth_uri: xxx
        token_uri: xxx
        auth_provider_x509_cert_url: xxx
        client_x509_cert_url: xxx

Optional configurations

Priority

The priority for the BigQuery jobs that dbt executes can be configured with the priority configuration in your BigQuery profile. The priority field can be set to one of batch or interactive. For more information on query priority, consult the BigQuery documentation.

my-profile:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth
      project: abc-123
      dataset: my_dataset
      priority: interactive

Timeouts and Retries

The dbt-bigquery plugin uses the BigQuery Python client library to submit queries. Each query requires two steps:

Job creation: Submit the query job to BigQuery, and receive its job ID.
Job execution: Wait for the query job to finish executing, and receive its result.

Some queries inevitably fail, at different points in process. To handle these cases, dbt supports fine-grainedYour data's grain is the combination of columns at which records in a table are unique. Ideally, this is captured in a single column and a unique primary key. configuration for query timeouts and retries.

job_execution_timeout_seconds

Use the job_execution_timeout_seconds configuration to set the number of seconds dbt should wait for queries to complete, after being submitted successfully. Of the four configurations that control timeout and retries, this one is the most common to use.

Renamed config

In older versions of dbt-bigquery, this same config was called timeout_seconds.

No timeout is set by default. (For historical reasons, some query types use a default of 300 seconds when the job_execution_timeout_seconds configuration is not set). When you do set the job_execution_timeout_seconds, if any dbt query takes more than 300 seconds to finish, the dbt-bigquery adapter will run into an exception:

 Operation did not complete within the designated timeout.

Note

The job_execution_timeout_seconds represents the number of seconds to wait for the underlying HTTP transport. It doesn't represent the maximum allowable time for a BigQuery job itself. Normally, BigQuery keeps running the job even if this timeout is reached, however dbt-bigquery will send a request to BigQuery to cancel it.

You can change the timeout seconds for the job execution step by configuring job_execution_timeout_seconds in the BigQuery profile:

my-profile:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth
      project: abc-123
      dataset: my_dataset
      job_execution_timeout_seconds: 600 # 10 minutes

job_creation_timeout_seconds

It is also possible for a query job to fail to submit in the first place. You can configure the maximum timeout for the job creation step by configuring job_creation_timeout_seconds. No timeout is set by default.

In the job creation step, dbt is simply submitting a query job to BigQuery's Jobs.Insert API, and receiving a query job ID in return. It should take a few seconds at most. In some rare situations, it could take longer.

job_retries

Google's BigQuery Python client has native support for retrying query jobs that time out, or queries that run into transient errors and are likely to succeed if run again. You can configure the maximum number of retries by configuring job_retries.

Renamed config

In older versions of dbt-bigquery, the job_retries config was just called retries.

The default value is 1, meaning that dbt will retry failing queries exactly once. You can set the configuration to 0 to disable retries entirely.

job_retry_deadline_seconds

After a query job times out, or encounters a transient error, dbt will wait one second before retrying the same query. In cases where queries are repeatedly timing out, this can add up to a long wait. You can set the job_retry_deadline_seconds configuration to set the total number of seconds you're willing to wait ("deadline") while retrying the same query. If dbt hits the deadline, it will give up and return an error.

Combining the four configurations above, we can maximize our chances of mitigating intermittent query errors. In the example below, we will wait up to 30 seconds for initial job creation. Then, we'll wait up to 10 minutes (600 seconds) for the query to return results. If the query times out, or encounters a transient error, we will retry it up to 5 times. The whole process cannot take longer than 20 minutes (1200 seconds). At that point, dbt will raise an error.

profiles.yml

my-profile:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth
      project: abc-123
      dataset: my_dataset
      job_creation_timeout_seconds: 30
      job_execution_timeout_seconds: 600
      job_retries: 5
      job_retry_deadline_seconds: 1200

Dataset locations

The location of BigQuery datasets can be configured using the location configuration in a BigQuery profile. location may be either a multi-regional location (e.g. EU, US), or a regional location (e.g. us-west2 ) as per the BigQuery documentation describes. Example:

my-profile:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth
      project: abc-123
      dataset: my_dataset
      location: US # Optional, one of US or EU, or a regional location

Maximum Bytes Billed

When a maximum_bytes_billed value is configured for a BigQuery profile, queries executed by dbt will fail if they exceed the configured maximum bytes threshhold. This configuration should be supplied as an integer number of bytes.

my-profile:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth
      project: abc-123
      dataset: my_dataset
      # If a query would bill more than a gigabyte of data, then
      # BigQuery will reject the query
      maximum_bytes_billed: 1000000000

Example output

Database Error in model debug_table (models/debug_table.sql)
  Query exceeded limit for bytes billed: 1000000000. 2000000000 or higher required.
  compiled SQL at target/run/bq_project/models/debug_table.sql

OAuth 2.0 Scopes for Google APIs

By default, the BigQuery connector requests three OAuth scopes, namely https://www.googleapis.com/auth/bigquery, https://www.googleapis.com/auth/cloud-platform, and https://www.googleapis.com/auth/drive. These scopes were originally added to provide access for the models that are reading from Google Sheets. However, in some cases, a user may need to customize the default scopes (for example, to reduce them down to the minimal set needed). By using the scopes profile configuration you are able to set up your own OAuth scopes for dbt. Example:

my-profile:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth
      project: abc-123
      dataset: my_dataset
      scopes:
        - https://www.googleapis.com/auth/bigquery

Service Account Impersonation

This feature allows users authenticating via local OAuth to access BigQuery resources based on the permissions of a service account.

my-profile:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth
      project: abc-123
      dataset: my_dataset
      impersonate_service_account: dbt-runner@yourproject.iam.gserviceaccount.com

For a general overview of this process, see the official docs for Creating Short-lived Service Account Credentials.

Why would I want to impersonate a service account?

How can I set up the right permissions in BigQuery?

Execution project

By default, dbt will use the specified project/database as both:

The location to materialize resources (models, seeds, snapshots, etc), unless they specify a custom project/database config
The GCP project that receives the bill for query costs or slot usage

Optionally, you may specify an execution_project to bill for query execution, instead of the project/database where you materialize most resources.

my-profile:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth
      project: abc-123
      dataset: my_dataset
      execution_project: buck-stops-here-456

Quota project

By default, dbt will use the quota_project_id set within the credentials of the account you are using to authenticate to BigQuery.

Optionally, you may specify quota_project to bill for query execution instead of the default quota project specified for the account from the environment.

This can sometimes be required when impersonating service accounts that do not have the BigQuery API enabled within the project in which they are defined. Without overriding the quota project, it will fail to connect.

If you choose to set a quota project, the account you use to authenticate must have the Service Usage Consumer role on that project.

my-profile:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth
      project: abc-123
      dataset: my_dataset
      quota_project: my-bq-quota-project

Running Python models on Dataproc

To run dbt Python models on GCP, dbt uses companion services, Dataproc and Cloud Storage, that offer tight integrations with BigQuery. You may use an existing Dataproc cluster and Cloud Storage bucket, or create new ones:

Then, add the bucket name, cluster name, and cluster region to your connection profile:

my-profile:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth
      project: abc-123
      dataset: my_dataset
      
      # for dbt Python models to be run on a Dataproc cluster
      gcs_bucket: dbt-python
      dataproc_cluster_name: dbt-python
      dataproc_region: us-central1

Alternatively, Dataproc Serverless can be used:

my-profile:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth
      project: abc-123
      dataset: my_dataset
      
      # for dbt Python models to be run on Dataproc Serverless
      gcs_bucket: dbt-python
      dataproc_region: us-central1
      submission_method: serverless
      dataproc_batch:
        batch_id: MY_CUSTOM_BATCH_ID # Supported in v1.7+
        environment_config:
          execution_config:
            service_account: dbt@abc-123.iam.gserviceaccount.com
            subnetwork_uri: regions/us-central1/subnetworks/dataproc-dbt
        labels:
          project: my-project
          role: dev
        runtime_config:
          properties:
            spark.executor.instances: "3"
            spark.driver.memory: 1g

For a full list of possible configuration fields that can be passed in dataproc_batch, refer to the Dataproc Serverless Batch documentation.

Required permissions

BigQuery's permission model is dissimilar from more conventional databases like Snowflake and Redshift. The following permissions are required for dbt user accounts:

BigQuery Data Editor
BigQuery User

This set of permissions will permit dbt users to read from and create tables and viewsA view (as opposed to a table) is a defined passthrough SQL query that can be run against a database (or data warehouse). in a BigQuery project.

Local OAuth gcloud setup

To connect to BigQuery using the oauth method, follow these steps:

Make sure the gcloud command is installed on your computer
Activate the application-default account with

gcloud auth application-default login \
  --scopes=https://www.googleapis.com/auth/bigquery,\
https://www.googleapis.com/auth/drive.readonly,\
https://www.googleapis.com/auth/iam.test

A browser window should open, and you should be prompted to log into your Google account. Once you've done that, dbt will use your OAuth'd credentials to connect to BigQuery!

This command uses the --scopes flag to request access to Google Sheets. This makes it possible to transform data in Google Sheets using dbt. If your dbt project does not transform data in Google Sheets, then you may omit the --scopes flag.

Installing dbt-bigquery

Configuring dbt-bigquery

Authentication Methods​

OAuth via gcloud​

OAuth Token-Based​

Refresh token​

Temporary token​

Service Account File​

Service Account JSON​

Optional configurations​

Priority​

Timeouts and Retries​

job_execution_timeout_seconds​

job_creation_timeout_seconds​

job_retries​

job_retry_deadline_seconds​

Dataset locations​

Maximum Bytes Billed​

OAuth 2.0 Scopes for Google APIs​

Service Account Impersonation​

Execution project​

Quota project​

Running Python models on Dataproc​

Required permissions​

Local OAuth gcloud setup​

Authentication Methods

OAuth via gcloud

OAuth Token-Based

Refresh token

Temporary token

Service Account File

Service Account JSON

Optional configurations

Priority

Timeouts and Retries

job_execution_timeout_seconds

job_creation_timeout_seconds

job_retries

job_retry_deadline_seconds

Dataset locations

Maximum Bytes Billed

OAuth 2.0 Scopes for Google APIs

Service Account Impersonation

Execution project

Quota project

Running Python models on Dataproc

Required permissions

Local OAuth gcloud setup