Developer Blog | dbt Developer Hub

Getting Started with git Branching Strategies and dbt

March 10, 2025 · 33 min read

Christine Berger

Resident Architect at dbt Labs

Carol Ohms

Resident Architect at dbt Labs

Taylor Dunlap

Senior Solutions Architect at dbt Labs

Steve Dowling

Senior Solutions Architect at dbt Labs

Hi! We’re Christine and Carol, Resident Architects at dbt Labs. Our day-to-day work is all about helping teams reach their technical and business-driven goals. Collaborating with a broad spectrum of customers ranging from scrappy startups to massive enterprises, we’ve gained valuable experience guiding teams to implement architecture which addresses their major pain points.

The information we’re about to share isn't just from our experiences - we frequently collaborate with other experts like Taylor Dunlap and Steve Dowling who have greatly contributed to the amalgamation of this guidance. Their work lies in being the critical bridge for teams between implementation and business outcomes, ultimately leading teams to align on a comprehensive technical vision through identification of problems and solutions.

Why are we here?
We help teams with dbt architecture, which encompasses the tools, processes and configurations used to start developing and deploying with dbt. There’s a lot of decision making that happens behind the scenes to standardize on these pieces - much of which is informed by understanding what we want the development workflow to look like. The focus on having the perfect workflow often gets teams stuck in heaps of planning and endless conversations, which slows down or even stops momentum on development. If you feel this, we’re hoping our guidance will give you a great sense of comfort in taking steps to unblock development - even when you don’t have everything figured out yet!

Parser, Better, Faster, Stronger: A peek at the new dbt engine

February 19, 2025 · 5 min read

Joel Labes

Staff Developer Experience Advocate at dbt Labs

Remember how dbt felt when you had a small project? You pressed enter and stuff just happened immediately? We're bringing that back.

Benchmarking tip: always try to get data that's good enough that you don't need to do statistics on it

After a series of deep dives into the guts of SQL comprehension, let's talk about speed a little bit. Specifically, I want to talk about one of the most annoying slowdowns as your project grows: project parsing.

When you're waiting a few seconds or a few minutes for things to start happening after you invoke dbt, it's because parsing isn't finished yet. But Lukas' SDF demo at last month's webinar didn't have a big wait, so why not?

The key technologies behind SQL Comprehension

January 24, 2025 · 16 min read

Dave Connors

Product Manager at dbt Labs

You ever wonder what’s really going on in your database when you fire off a (perfect, efficient, full-of-insight) SQL query to your database?

OK, probably not 😅. Your personal tastes aside, we’ve been talking a lot about SQL Comprehension tools at dbt Labs in the wake of our acquisition of SDF Labs, and think that the community would benefit if we included them in the conversation too! We recently published a blog that talked about the different levels of SQL Comprehension tools. If you read that, you may have encountered a few new terms you weren’t super familiar with.

In this post, we’ll talk about the technologies that underpin SQL Comprehension tools in more detail. Hopefully, you come away with a deeper understanding of and appreciation for the hard work that your computer does to turn your SQL queries into actionable business insights!

The Three Levels of SQL Comprehension: What they are and why you need to know about them

January 23, 2025 · 9 min read

Joel Labes

Staff Developer Experience Advocate at dbt Labs

Ever since dbt Labs acquired SDF Labs last week, I've been head-down diving into their technology and making sense of it all. The main thing I knew going in was "SDF understands SQL". It's a nice pithy quote, but the specifics are fascinating.

For the next era of Analytics Engineering to be as transformative as the last, dbt needs to move beyond being a string preprocessor and into fully comprehending SQL. For the first time, SDF provides the technology necessary to make this possible. Today we're going to dig into what SQL comprehension actually means, since it's so critical to what comes next.

Why I wish I had a control plane for my renovation

January 21, 2025 · 4 min read

Mark Wan

Senior Solutions Architect at dbt Labs

When my wife and I renovated our home, we chose to take on the role of owner-builder. It was a bold (and mostly naive) decision, but we wanted control over every aspect of the project. What we didn’t realize was just how complex and exhausting managing so many moving parts would be.

My wife pondering our sanity

We had to coordinate multiple elements:

The architects, who designed the layout, interior, and exterior.
The architectural plans, which outlined what the house should look like.
The builders, who executed those plans.
The inspectors, councils, and energy raters, who checked whether everything met the required standards.

Test smarter not harder: Where should tests go in your pipeline?

December 9, 2024 · 8 min read

Faith McKenna

Senior Technical Instructor at dbt Labs

Jerrie Kumalah Kenney

Resident Architect at dbt Labs

👋 Greetings, dbt’ers! It’s Faith & Jerrie, back again to offer tactical advice on where to put tests in your pipeline.

In our first post on refining testing best practices, we developed a prioritized list of data quality concerns. We also documented first steps for debugging each concern. This post will guide you on where specific tests should go in your data pipeline.

Note that we are constructing this guidance based on how we structure data at dbt Labs. You may use a different modeling approach—that’s okay! Translate our guidance to your data’s shape, and let us know in the comments section what modifications you made.

First, here’s our opinions on where specific tests should go:

Source tests should be fixable data quality concerns. See the callout box below for what we mean by “fixable”.
Staging tests should be business-focused anomalies specific to individual tables, such as accepted ranges or ensuring sequential values. In addition to these tests, your staging layer should clean up any nulls, duplicates, or outliers that you can’t fix in your source system. You generally don’t need to test your cleanup efforts.
Intermediate and marts layer tests should be business-focused anomalies resulting specifically from joins or calculations. You also may consider adding additional primary key and not null tests on columns where it’s especially important to protect the grain.

Test smarter not harder: add the right tests to your dbt project

November 11, 2024 · 11 min read

Faith McKenna

Senior Technical Instructor at dbt Labs

Jerrie Kumalah Kenney

Resident Architect at dbt Labs

The Analytics Development Lifecycle (ADLC) is a workflow for improving data maturity and velocity. Testing is a key phase here. Many dbt developers tend to focus on primary keys and source freshness. We think there is a more holistic and in-depth path to tread. Testing is a key piece of the ADLC, and it should drive data quality.

In this blog, we’ll walk through a plan to define data quality. This will look like:

identifying data hygiene issues
identifying business-focused anomaly issues
identifying stats-focused anomaly issues

Once we have defined data quality, we’ll move on to prioritize those concerns. We will:

think through each concern in terms of the breadth of impact
decide if each concern should be at error or warning severity

Snowflake feature store and dbt: A bridge between data pipelines and ML

October 8, 2024 · 14 min read

Randy Pettus

Senior Partner Sales Engineer at Snowflake

Luis Leon

Partner Solutions Architect at dbt Labs

Flying home into Detroit this past week working on this blog post on a plane and saw for the first time, the newly connected deck of the Gordie Howe International bridge spanning the Detroit River and connecting the U.S. and Canada. The image stuck out because, in one sense, a feature store is a bridge between the clean, consistent datasets and the machine learning models that rely upon this data. But, more interesting than the bridge itself is the massive process of coordination needed to build it. This construction effort — I think — can teach us more about processes and the need for feature stores in machine learning (ML).

Think of the manufacturing materials needed as our data and the building of the bridge as the building of our ML models. There are thousands of engineers and construction workers taking materials from all over the world, pulling only the specific pieces needed for each part of the project. However, to make this project truly work at this scale, we need the warehousing and logistics to ensure that each load of concrete rebar and steel meets the standards for quality and safety needed and is available to the right people at the right time — as even a single fault can have catastrophic consequences or cause serious delays in project success. This warehouse and the associated logistics play the role of the feature store, ensuring that data is delivered consistently where and when it is needed to train and run ML models.

Iceberg Is An Implementation Detail

October 4, 2024 · 6 min read

Amy Chen

Product Manager at dbt Labs

If you haven’t paid attention to the data industry news cycle, you might have missed the recent excitement centered around an open table format called Apache Iceberg™. It’s one of many open table formats like Delta Lake, Hudi, and Hive. These formats are changing the way data is stored and metadata accessed. They are groundbreaking in many ways.

But I have to be honest: I don’t care. But not for the reasons you think.

How Hybrid Mesh unlocks dbt collaboration at scale

September 30, 2024 · 7 min read

Jason Ganz

Director of Community, Developer Experience & AI at dbt Labs

One of the most important things that dbt does is unlock the ability for teams to collaborate on creating and disseminating organizational knowledge.

In the past, this primarily looked like a team working in one dbt Project to create a set of transformed objects in their data platform.

As dbt was adopted by larger organizations and began to drive workloads at a global scale, it became clear that we needed mechanisms to allow teams to operate independently from each other, creating and sharing data models across teams — dbt Mesh.

How to build a Semantic Layer in pieces: step-by-step for busy analytics engineers

July 10, 2024 · 10 min read

Gwen Windflower

Senior Developer Experience Advocate

The dbt Semantic Layer is founded on the idea that data transformation should be both flexible, allowing for on-the-fly aggregations grouped and filtered by definable dimensions and version-controlled and tested. Like any other codebase, you should have confidence that your transformations express your organization’s business logic correctly. Historically, you had to choose between these options, but the dbt Semantic Layer brings them together. This has required new paradigms for how you express your transformations though.

Putting Your DAG on the internet

June 14, 2024 · 5 min read

Ernesto Ongaro

Senior Solutions Architect at dbt Labs

Sebastian Stan

Data Engineer at EQT Group

Filip Byrén

VP and Software Architect at EQT Group

New in dbt: allow Snowflake Python models to access the internet

With dbt 1.8, dbt released support for Snowflake’s external access integrations further enabling the use of dbt + AI to enrich your data. This allows querying of external APIs within dbt Python models, a functionality that was required for dbt Cloud customer, EQT AB. Learn about why they needed it and how they helped build the feature and get it shipped!

Up and Running with Azure Synapse on dbt Cloud

May 17, 2024 · 11 min read

Anders Swanson

Senior Developer Experience Advocate at dbt Labs

At dbt Labs, we’ve always believed in meeting analytics engineers where they are. That’s why we’re so excited to announce that today, analytics engineers within the Microsoft Ecosystem can use dbt Cloud with not only Microsoft Fabric but also Azure Synapse Analytics Dedicated SQL Pools (ASADSP).

Since the early days of dbt, folks have been interested having MSFT data platforms. Huge shoutout to Mikael Ene and Jacob Mastel for their efforts back in 2019 on the original SQL Server adapters (dbt-sqlserver and dbt-mssql, respectively)

The journey for the Azure Synapse dbt adapter, dbt-synapse, is closely tied to my journey with dbt. I was the one who forked dbt-sqlserver into dbt-synapse in April of 2020. I had first learned of dbt only a month earlier and knew immediately that my team needed the tool. With a great deal of assistance from Jeremy and experts at Microsoft, my team and I got it off the ground and started using it. When I left my team at Avanade in early 2022 to join dbt Labs, I joked that I wasn’t actually leaving the team; I was just temporarily embedding at dbt Labs to expedite dbt Labs getting into Cloud. Two years later, I can tell my team that the mission has been accomplished! Kudos to all the folks who have contributed to the TSQL adapters either directly in GitHub or in the community Slack channels. The integration would not exist if not for you!

Unit testing in dbt for test-driven development

May 7, 2024 · 9 min read

Doug Beatty

Senior Developer Experience Advocate at dbt Labs

Do you ever have "bad data" dreams? Or am I the only one that has recurring nightmares? 😱

Here's the one I had last night:

It began with a midnight bug hunt. A menacing insect creature has locked my colleagues in a dungeon, and they are pleading for my help to escape . Finding the key is elusive and always seems just beyond my grasp. The stress is palpable, a physical weight on my chest, as I raced against time to unlock them.

Of course I wake up without actually having saved them, but I am relieved nonetheless. And I've had similar nightmares involving a heroic code refactor or the launch of a new model or feature.

Good news: beginning in dbt v1.8, we're introducing a first-class unit testing framework that can handle each of the scenarios from my data nightmares.

Before we dive into the details, let's take a quick look at how we got here.

Conversational Analytics: A Natural Language Interface to your Snowflake Data

May 2, 2024 · 12 min read

Doug Guthrie

Senior Solutions Architect at dbt Labs

Introduction

As a solutions architect at dbt Labs, my role is to help our customers and prospects understand how to best utilize the dbt Cloud platform to solve their unique data challenges. That uniqueness presents itself in different ways - organizational maturity, data stack, team size and composition, technical capability, use case, or some combination of those. With all those differences though, there has been one common thread throughout most of my engagements: Generative AI and Large Language Models (LLMs). Data teams are either 1) proactively thinking about applications for it in the context of their work or 2) being pushed to think about it by their stakeholders. It has become the elephant in every single (zoom) room I find myself in.

How we're making sure you can confidently switch to the "Latest" release track in dbt Cloud

May 2, 2024 · 10 min read

Michelle Ark

Staff Software Engineer at dbt Labs

Chenyu Li

Staff Software Engineer at dbt Labs

Colin Rogers

Senior Software Engineer at dbt Labs

Versionless is now the "latest" release track

This blog post was updated on December 04, 2024 to rename "versionless" to the "latest" release track allowing for the introduction of less-frequent release tracks. Learn more about Release Tracks and how to use them.

As long as dbt Cloud has existed, it has required users to select a version of dbt Core to use under the hood in their jobs and environments. This made sense in the earliest days, when dbt Core minor versions often included breaking changes. It provided a clear way for everyone to know which version of the underlying runtime they were getting.

However, this came at a cost. While bumping a project's dbt version appeared as simple as selecting from a dropdown, there was real effort required to test the compatibility of the new version against existing projects, package dependencies, and adapters. On the other hand, putting this off meant foregoing access to new features and bug fixes in dbt.

But no more. Today, we're ready to announce the general availability of a new option in dbt Cloud: the "Latest" release track.

Maximum override: Configuring unique connections in dbt Cloud

April 22, 2024 · 6 min read

Gwen Windflower

Senior Developer Experience Advocate

dbt Cloud now includes a suite of new features that enable configuring precise and unique connections to data platforms at the environment and user level. These enable more sophisticated setups, like connecting a project to multiple warehouse accounts, first-class support for staging environments, and user-level overrides for specific dbt versions. This gives dbt Cloud developers the features they need to tackle more complex tasks, like Write-Audit-Publish (WAP) workflows and safely testing dbt version upgrades. While you still configure a default connection at the project level and per-developer, you now have tools to get more advanced in a secure way. Soon, dbt Cloud will take this even further allowing multiple connections to be set globally and reused with global connections.

LLM-powered Analytics Engineering: How we're using AI inside of our dbt project, today, with no new tools.

March 19, 2024 · 10 min read

Joel Labes

Staff Developer Experience Advocate at dbt Labs

Cloud Data Platforms make new things possible; dbt helps you put them into production

The original paradigm shift that enabled dbt to exist and be useful was databases going to the cloud.

All of a sudden it was possible for more people to do better data work as huge blockers became huge opportunities:

We could now dynamically scale compute on-demand, without upgrading to a larger on-prem database.
We could now store and query enormous datasets like clickstream data, without pre-aggregating and transforming it.

Today, the next wave of innovation is happening in AI and LLMs, and it's coming to the cloud data platforms dbt practitioners are already using every day. For one example, Snowflake have just released their Cortex functions to access LLM-powered tools tuned for running common tasks against your existing datasets. In doing so, there are a new set of opportunities available to us:

Column-Level Lineage, Model Performance, and Recommendations: ship trusted data products with dbt Explorer

February 13, 2024 · 9 min read

Dave Connors

Product Manager at dbt Labs

What’s in a data platform?

Raising a dbt project is hard work. We, as data professionals, have poured ourselves into raising happy healthy data products, and we should be proud of the insights they’ve driven. It certainly wasn’t without its challenges though — we remember the terrible twos, where we worked hard to just get the platform to walk straight. We remember the angsty teenage years where tests kept failing, seemingly just to spite us. A lot of blood, sweat, and tears are shed in the service of clean data!

Once the project could dress and feed itself, we also worked hard to get buy-in from our colleagues who put their trust in our little project. Without deep trust and understanding of what we built, our colleagues who depend on your data (or even those involved in developing it with you — it takes a village after all!) are more likely to be in your DMs with questions than in their BI tools, generating insights.

When our teammates ask about where the data in their reports come from, how fresh it is, or about the right calculation for a metric, what a joy! This means they want to put what we’ve built to good use — the challenge is that, historically, it hasn’t been all that easy to answer these questions well. That has often meant a manual, painstaking process of cross checking run logs and your dbt documentation site to get the stakeholder the information they need.

Enter dbt Explorer! dbt Explorer centralizes documentation, lineage, and execution metadata to reduce the work required to ship trusted data products faster.

Serverless, free-tier data stack with dlt + dbt core.

January 15, 2024 · 8 min read

Euan Johnston

Freelance Business Intelligence manager

The problem, the builder and tooling

The problem: My partner and I are considering buying a property in Portugal. There is no reference data for the real estate market here - how many houses are being sold, for what price? Nobody knows except the property office and maybe the banks, and they don’t readily divulge this information. The only data source we have is Idealista, which is a portal where real estate agencies post ads.

Unfortunately, there are significantly fewer properties than ads - it seems many real estate companies re-post the same ad that others do, with intentionally different data and often misleading bits of info. The real estate agencies do this so the interested parties reach out to them for clarification, and from there they can start a sales process. At the same time, the website with the ads is incentivised to allow this to continue as they get paid per ad, not per property.

The builder: I’m a data freelancer who deploys end to end solutions, so when I have a data problem, I cannot just let it go.

The tools: I want to be able to run my project on Google Cloud Functions due to the generous free tier. dlt is a new Python library for declarative data ingestion which I have wanted to test for some time. Finally, I will use dbt Core for transformation.

Getting Started with git Branching Strategies and dbt

Parser, Better, Faster, Stronger: A peek at the new dbt engine

The key technologies behind SQL Comprehension

The Three Levels of SQL Comprehension: What they are and why you need to know about them

Why I wish I had a control plane for my renovation

Test smarter not harder: Where should tests go in your pipeline?

Test smarter not harder: add the right tests to your dbt project

Snowflake feature store and dbt: A bridge between data pipelines and ML

Iceberg Is An Implementation Detail

How Hybrid Mesh unlocks dbt collaboration at scale

How to build a Semantic Layer in pieces: step-by-step for busy analytics engineers

Putting Your DAG on the internet

New in dbt: allow Snowflake Python models to access the internet

Up and Running with Azure Synapse on dbt Cloud

Unit testing in dbt for test-driven development

Conversational Analytics: A Natural Language Interface to your Snowflake Data

Introduction

How we're making sure you can confidently switch to the "Latest" release track in dbt Cloud

Maximum override: Configuring unique connections in dbt Cloud

LLM-powered Analytics Engineering: How we're using AI inside of our dbt project, today, with no new tools.

Cloud Data Platforms make new things possible; dbt helps you put them into production

Column-Level Lineage, Model Performance, and Recommendations: ship trusted data products with dbt Explorer

What’s in a data platform?

Serverless, free-tier data stack with dlt + dbt core.

The problem, the builder and tooling

Start building with dbt.

Resources

Community

Support

Connect with Us

New in dbt: allow Snowflake Python models to access the internet​

Introduction​

Cloud Data Platforms make new things possible; dbt helps you put them into production​

What’s in a data platform?​

The problem, the builder and tooling​

Resources

Community

Support

Connect with Us

New in dbt: allow Snowflake Python models to access the internet

Introduction

Cloud Data Platforms make new things possible; dbt helps you put them into production

What’s in a data platform?

The problem, the builder and tooling