Skip to main content

Advanced CI enterprise

Continuous integration workflows help increase the governance and improve the quality of the data. Additionally for these CI jobs, you can use Advanced CI features, such as compare changes, that provide details about the changes between what's currently in your production environment and the pull request's latest commit, giving you observability into how data changes are affected by your code changes. By analyzing the data changes that code changes produce, you can ensure you're always shipping trustworthy data products as you're developing.

How to enable this feature

You can opt into Advanced CI in dbt Cloud. Please refer to Account access to Advance CI features to learn how enable it in your dbt Cloud account.

More features

dbt Labs plans to provide additional Advanced CI features in the near future. More info coming soon.

Prerequisites

  • You have a dbt Cloud Enterprise account.
  • You have Advance CI features enabled.
  • You use a supported data platform: BigQuery, Databricks, Postgres, or Snowflake. Support for additional data platforms coming soon.

Compare changes feature

For CI jobs that have the dbt compare option enabled, dbt Cloud compares the changes between the last applied state of the production environment (defaulting to deferral for lower compute costs) and the latest changes from the pull request, whenever a pull request is opened or new commits are pushed.

dbt reports the comparison differences in:

  • dbt Cloud — Shows the changes (if any) to the data's primary keys, rows, and columns in the Compare tab from the Job run details page.
  • The pull request from your Git provider — Shows a summary of the changes as a Git comment.
Example of the Compare tabExample of the Compare tab

Optimizing comparisons

When an event_time column is specified on your model, compare changes can optimize comparisons by using only the overlapping timeframe (meaning the timeframe exists in both the CI and production environment), helping you avoid incorrect row-count changes and return results faster.

This is useful in scenarios like:

  • Subset of data in CI — When CI builds only a subset of data (like the most recent 7 days), compare changes would interpret the excluded data as "deleted rows." Configuring event_time allows you to avoid this issue by limiting comparisons to the overlapping timeframe, preventing false alerts about data deletions that are just filtered out in CI.
  • Fresher data in CI than in production — When your CI job includes fresher data than production (because it has run more recently), compare changes would flag the additional rows as "new" data, even though they’re just fresher data in CI. With event_time configured, the comparison only includes the shared timeframe and correctly reflects actual changes in the data.
event_time ensures the same time-slice of data is accurately compared between your CI and production environments.event_time ensures the same time-slice of data is accurately compared between your CI and production environments.

About the cached data

After comparing changes, dbt Cloud stores a cache of no more than 100 records for each modified model for preview purposes. By caching this data, you can view the examples of changed data without rerunning the comparison against the data warehouse every time (optimizing for lower compute costs). To display the changes, dbt Cloud uses a cached version of a sample of the data records. These data records are queried from the database using the connection configuration (such as user, role, service account, and so on) that's set in the CI job's environment.

You control what data to use. This may include synthetic data if pre-production or development data is heavily regulated or sensitive.

  • The selected data is cached on dbt Labs' systems for up to 30 days. No data is retained on dbt Labs' systems beyond this period.
  • The cache is encrypted and stored in an Amazon S3 or Azure blob storage in your account’s region.
  • dbt Labs will not access cached data from Advanced CI for its benefit and the data is only used to provide services as directed by you.
  • Third-party subcontractors, other than storage subcontractors, will not have access to the cached data.

If you access a CI job run that's more than 30 days old, you will not be able to see the comparison results. Instead, a message will appear indicating that the data has expired.

Example of message about expired data in the Compare tabExample of message about expired data in the Compare tab

Connection permissions

The compare changes feature uses the same credentials as the CI job, as defined in the CI job’s environment. The dbt Cloud administrator must ensure that client CI credentials are appropriately restricted since all customer's account users will be able to view the comparison results and the cached data.

If using dynamic data masking in the data warehouse, the cached data will no longer be dynamically masked in the Advanced CI output, depending on the permissions of the users who view it. dbt Labs recommends limiting user access to unmasked data or considering using synthetic data for the Advanced CI testing functionality.

Example of credentials in the user settingsExample of credentials in the user settings
0