To defer or to clone, that is the question
Hi all, I’m Kshitij, a senior software engineer on the Core team at dbt Labs.
One of the coolest moments of my career here thus far has been shipping the new dbt clone command as part of the dbt-core v1.6 release.
However, one of the questions I’ve received most frequently is guidance around “when” to clone that goes beyond the documentation on “how” to clone. In this blog post, I’ll attempt to provide this guidance by answering these FAQs:
- What is
dbt clone? - How is it different from deferral?
- Should I defer or should I clone?
What is dbt clone?
dbt clone is a new command in dbt 1.6 that leverages native zero-copy clone functionality on supported warehouses to copy entire schemas for free, almost instantly.
How is this possible?
Well, the warehouse “cheats” by only copying metadata from the source schema to the target schema; the underlying data remains at rest during this operation.
This metadata includes materialized objects like tables and views, which is why you see a clone of these objects in the target schema.
In computer science jargon, clone makes a copy of the pointer from the source schema to the underlying data; after the operation there are now two pointers (source and target schemas) that each point to the same underlying data.
How is cloning different from deferral?
On the surface, cloning and deferral seem similar – they’re both ways to save costs in the data warehouse. They do this by bypassing expensive model re-computations – clone by eagerly copying an entire schema into the target schema, and defer by lazily referencing pre-built models in the source schema.
Let’s unpack this sentence and explore its first-order effects:
| Loading table... |
These first-order effects lead to the following second-order effects that truly distinguish clone and defer from each other:
| Loading table... |
Should I defer or should I clone?
Putting together all the points above, here’s a handy cheat sheet for when to defer and when to clone:
| Loading table... |
To absolutely drive this point home:
- If you send someone this cheatsheet by linking to this page, you are deferring to this page
- If you print out this page and write notes in the margins, you have cloned this page
Putting it in practice
Using the cheat sheet above, let’s explore a few common scenarios and explore whether we should use defer or clone for each:
-
Testing staging datasets in BI
In this scenario, we want to:
- Make a copy of our production dataset available in our downstream BI tool
- To safely iterate on this copy without breaking production datasets
Therefore, we should use clone in this scenario.
-
In this scenario, we want to:
- Refer to production models wherever possible to speed up continuous integration (CI) runs
- Only run and test models in the CI staging environment that have changed from the production environment
- Reference models from different environments – prod for unchanged models, and staging for modified models
Therefore, we should use defer in this scenario.
dbt clone in CI jobs to test incremental modelsLearn how to use dbt clone in CI jobs to efficiently test modified incremental models, simulating post-merge behavior while avoiding full-refresh costs.
-
In this scenario, we want to:
- Ensure that all tests are always passing on the production dataset, even if that dataset is slightly stale
- Atomically rollback a promotion to production if tests aren’t passing across the entire staging dataset
In this scenario, we can use clone to implement a deployment strategy known as blue-green deployments where we build the entire staging dataset and then run tests against it, and only clone it over to production if all tests pass.
As a rule of thumb, deferral lends itself better to continuous integration (CI) use cases whereas cloning lends itself better to continuous deployment (CD) use cases.
Wrapping Up
In this post, we covered what dbt clone is, how it is different from deferral, and when to use each. Often, they can be used together within the same project in different parts of the deployment lifecycle.
Thanks for reading, and I look forward to seeing what you build with dbt clone.
Thanks to Jason Ganz and Gwen Windflower for reviewing drafts of this article


Comments