BigQuery setup
Overview of dbt-bigquery
- Maintained by: dbt Labs
- Authors: core dbt maintainers
- GitHub repo: dbt-labs/dbt-bigquery
- PyPI package:
dbt-bigquery
- Slack channel: #db-bigquery
- Supported dbt Core version: v0.10.0 and newer
- dbt Cloud support: Supported
- Minimum data platform version: n/a
Installing dbt-bigquery
pip is the easiest way to install the adapter:
pip install dbt-bigquery
Installing dbt-bigquery
will also install dbt-core
and any other dependencies.
Configuring dbt-bigquery
For Big Query-specifc configuration please refer to Big Query Configuration
For further info, refer to the GitHub repository: dbt-labs/dbt-bigquery
Authentication Methods
BigQuery targets can be specified using one of four methods:
For local development, we recommend using the oauth method. If you're scheduling dbt on a server, you should use the service account auth method instead.
BigQuery targets should be set up using the following configuration in your profiles.yml
file. There are a number of optional configurations you may specify as well.
OAuth via gcloud
This connection method requires local OAuth via gcloud
.
# Note that only one of these targets is required
my-bigquery-db:
target: dev
outputs:
dev:
type: bigquery
method: oauth
project: [GCP project id]
dataset: [the name of your dbt dataset] # You can also use "schema" here
threads: [1 or more]
<optional_config>: <value>
Default project
If you do not specify a project
/database
and are using the oauth
method, dbt will use the default project
associated with your user, as defined by gcloud config set
.
Oauth Token-Based
See docs on using Oauth 2.0 to access Google APIs.
- Refresh token
- Temporary token
Using the refresh token and client information, dbt will mint new access tokens as necessary.
my-bigquery-db:
target: dev
outputs:
dev:
type: bigquery
method: oauth-secrets
project: [GCP project id]
dataset: [the name of your dbt dataset] # You can also use "schema" here
threads: [1 or more]
refresh_token: [token]
client_id: [client id]
client_secret: [client secret]
token_uri: [redirect URI]
<optional_config>: <value>
dbt will use the one-time access token, no questions asked. This approach makes sense if you have an external deployment process that can mint new access tokens and update the profile file accordingly.
my-bigquery-db:
target: dev
outputs:
dev:
type: bigquery
method: oauth-secrets
project: [GCP project id]
dataset: [the name of your dbt dataset] # You can also use "schema" here
threads: [1 or more]
token: [temporary access token] # refreshed + updated by external process
<optional_config>: <value>
Service Account File
my-bigquery-db:
target: dev
outputs:
dev:
type: bigquery
method: service-account
project: [GCP project id]
dataset: [the name of your dbt dataset]
threads: [1 or more]
keyfile: [/path/to/bigquery/keyfile.json]
<optional_config>: <value>
Service Account JSON
This authentication method is only recommended for production environments where using a Service Account Keyfile is impractical.
my-bigquery-db:
target: dev
outputs:
dev:
type: bigquery
method: service-account-json
project: [GCP project id]
dataset: [the name of your dbt dataset]
threads: [1 or more]
<optional_config>: <value>
# These fields come from the service account json keyfile
keyfile_json:
type: xxx
project_id: xxx
private_key_id: xxx
private_key: xxx
client_email: xxx
client_id: xxx
auth_uri: xxx
token_uri: xxx
auth_provider_x509_cert_url: xxx
client_x509_cert_url: xxx
Optional configurations
Priority
The priority
for the BigQuery jobs that dbt executes can be configured with the priority
configuration in your BigQuery profile. The priority
field can be set to one of batch
or interactive
. For more information on query priority, consult the BigQuery documentation.
my-profile:
target: dev
outputs:
dev:
type: bigquery
method: oauth
project: abc-123
dataset: my_dataset
priority: interactive
Timeouts and Retries
Dataset locations
The location of BigQuery datasets can be configured using the location
configuration in a BigQuery profile.
location
may be either a multi-regional location (e.g. EU
, US
), or a regional location (e.g. us-west2
) as per the BigQuery documentation describes.
Example:
my-profile:
target: dev
outputs:
dev:
type: bigquery
method: oauth
project: abc-123
dataset: my_dataset
location: US # Optional, one of US or EU, or a regional location
Maximum Bytes Billed
When a maximum_bytes_billed
value is configured for a BigQuery profile,
queries executed by dbt will fail if they exceed the configured maximum bytes
threshhold. This configuration should be supplied as an integer number
of bytes.
my-profile:
target: dev
outputs:
dev:
type: bigquery
method: oauth
project: abc-123
dataset: my_dataset
# If a query would bill more than a gigabyte of data, then
# BigQuery will reject the query
maximum_bytes_billed: 1000000000
Example output
Database Error in model debug_table (models/debug_table.sql)
Query exceeded limit for bytes billed: 1000000000. 2000000000 or higher required.
compiled SQL at target/run/bq_project/models/debug_table.sql
OAuth 2.0 Scopes for Google APIs
By default, the BigQuery connector requests three OAuth scopes, namely https://www.googleapis.com/auth/bigquery
, https://www.googleapis.com/auth/cloud-platform
, and https://www.googleapis.com/auth/drive
. These scopes were originally added to provide access for the models that are reading from Google Sheets. However, in some cases, a user may need to customize the default scopes (for example, to reduce them down to the minimal set needed). By using the scopes
profile configuration you are able to set up your own OAuth scopes for dbt. Example:
my-profile:
target: dev
outputs:
dev:
type: bigquery
method: oauth
project: abc-123
dataset: my_dataset
scopes:
- https://www.googleapis.com/auth/bigquery
Service Account Impersonation
This feature allows users authenticating via local oauth to access BigQuery resources based on the permissions of a service account.
my-profile:
target: dev
outputs:
dev:
type: bigquery
method: oauth
project: abc-123
dataset: my_dataset
impersonate_service_account: dbt-runner@yourproject.iam.gserviceaccount.com
For a general overview of this process, see the official docs for Creating Short-lived Service Account Credentials.
Execution project
By default, dbt will use the specified project
/database
as both:
- The location to materialize resources (models, seeds, snapshots, etc), unless they specify a custom
project
/database
config - The GCP project that receives the bill for query costs or slot usage
Optionally, you may specify an execution_project
to bill for query execution, instead of the project
/database
where you materialize most resources.
my-profile:
target: dev
outputs:
dev:
type: bigquery
method: oauth
project: abc-123
dataset: my_dataset
execution_project: buck-stops-here-456
Required permissions
BigQuery's permission model is dissimilar from more conventional databases like Snowflake and Redshift. The following permissions are required for dbt user accounts:
- BigQuery Data Editor
- BigQuery User
This set of permissions will permit dbt users to read from and create tables and views in a BigQuery project.
Local OAuth gcloud setup
To connect to BigQuery using the oauth
method, follow these steps:
- Make sure the
gcloud
command is installed on your computer - Activate the application-default account with
gcloud auth application-default login \
--scopes=https://www.googleapis.com/auth/bigquery,\
https://www.googleapis.com/auth/drive.readonly,\
https://www.googleapis.com/auth/iam.test
A browser window should open, and you should be prompted to log into your Google account. Once you've done that, dbt will use your oauth'd credentials to connect to BigQuery!
This command uses the --scopes
flag to request access to Google Sheets. This makes it possible to transform data in Google Sheets using dbt. If your dbt project does not transform data in Google Sheets, then you may omit the --scopes
flag.