Data ecosystem

Walkthroughs of how top data practitioners use tools in the modern data stack.

Parser, Better, Faster, Stronger: A peek at the new dbt engine

February 19, 2025 · 5 min read

Senior Developer Experience Advocate at dbt Labs

Remember how dbt felt when you had a small project? You pressed enter and stuff just happened immediately? We're bringing that back.

Benchmarking tip: always try to get data that's good enough that you don't need to do statistics on it

After a series of deep dives into the guts of SQL comprehension, let's talk about speed a little bit. Specifically, I want to talk about one of the most annoying slowdowns as your project grows: project parsing.

When you're waiting a few seconds or a few minutes for things to start happening after you invoke dbt, it's because parsing isn't finished yet. But Lukas' SDF demo at last month's webinar didn't have a big wait, so why not?

The key technologies behind SQL Comprehension

January 24, 2025 · 16 min read

Dave Connors

Staff Developer Experience Advocate at dbt Labs

You ever wonder what’s really going on in your database when you fire off a (perfect, efficient, full-of-insight) SQL query to your database?

OK, probably not 😅. Your personal tastes aside, we’ve been talking a lot about SQL Comprehension tools at dbt Labs in the wake of our acquisition of SDF Labs, and think that the community would benefit if we included them in the conversation too! We recently published a blog that talked about the different levels of SQL Comprehension tools. If you read that, you may have encountered a few new terms you weren’t super familiar with.

In this post, we’ll talk about the technologies that underpin SQL Comprehension tools in more detail. Hopefully, you come away with a deeper understanding of and appreciation for the hard work that your computer does to turn your SQL queries into actionable business insights!

The Three Levels of SQL Comprehension: What they are and why you need to know about them

January 23, 2025 · 9 min read

Joel Labes

Senior Developer Experience Advocate at dbt Labs

Ever since dbt Labs acquired SDF Labs last week, I've been head-down diving into their technology and making sense of it all. The main thing I knew going in was "SDF understands SQL". It's a nice pithy quote, but the specifics are fascinating.

For the next era of Analytics Engineering to be as transformative as the last, dbt needs to move beyond being a string preprocessor and into fully comprehending SQL. For the first time, SDF provides the technology necessary to make this possible. Today we're going to dig into what SQL comprehension actually means, since it's so critical to what comes next.

Why I wish I had a control plane for my renovation

January 21, 2025 · 4 min read

Mark Wan

Senior Solutions Architect at dbt Labs

When my wife and I renovated our home, we chose to take on the role of owner-builder. It was a bold (and mostly naive) decision, but we wanted control over every aspect of the project. What we didn’t realize was just how complex and exhausting managing so many moving parts would be.

My wife pondering our sanity

We had to coordinate multiple elements:

The architects, who designed the layout, interior, and exterior.
The architectural plans, which outlined what the house should look like.
The builders, who executed those plans.
The inspectors, councils, and energy raters, who checked whether everything met the required standards.

Putting Your DAG on the internet

June 14, 2024 · 5 min read

Ernesto Ongaro

Senior Solutions Architect at dbt Labs

Sebastian Stan

Data Engineer at EQT Group

Filip Byrén

VP and Software Architect at EQT Group

New in dbt: allow Snowflake Python models to access the internet

With dbt 1.8, dbt released support for Snowflake’s external access integrations further enabling the use of dbt + AI to enrich your data. This allows querying of external APIs within dbt Python models, a functionality that was required for dbt Cloud customer, EQT AB. Learn about why they needed it and how they helped build the feature and get it shipped!

LLM-powered Analytics Engineering: How we're using AI inside of our dbt project, today, with no new tools.

March 19, 2024 · 10 min read

Joel Labes

Senior Developer Experience Advocate at dbt Labs

Cloud Data Platforms make new things possible; dbt helps you put them into production

The original paradigm shift that enabled dbt to exist and be useful was databases going to the cloud.

All of a sudden it was possible for more people to do better data work as huge blockers became huge opportunities:

We could now dynamically scale compute on-demand, without upgrading to a larger on-prem database.
We could now store and query enormous datasets like clickstream data, without pre-aggregating and transforming it.

Today, the next wave of innovation is happening in AI and LLMs, and it's coming to the cloud data platforms dbt practitioners are already using every day. For one example, Snowflake have just released their Cortex functions to access LLM-powered tools tuned for running common tasks against your existing datasets. In doing so, there are a new set of opportunities available to us:

Optimizing Materialized Views with dbt

August 3, 2023 · 11 min read

Amy Chen

Product Manager at dbt Labs

note

This blog post was updated on December 18, 2023 to cover the support of MVs on dbt-bigquery and updates on how to test MVs.

Introduction

The year was 2020. I was a kitten-only household, and dbt Labs was still Fishtown Analytics. A enterprise customer I was working with, Jetblue, asked me for help running their dbt models every 2 minutes to meet a 5 minute SLA.

After getting over the initial terror, we talked through the use case and soon realized there was a better option. Together with my team, I created lambda views to meet the need.

Flash forward to 2023. I’m writing this as my giant dog snores next to me (don’t worry the cats have multiplied as well). Jetblue has outgrown lambda views due to performance constraints (a view can only be so performant) and we are at another milestone in dbt’s journey to support streaming. What. a. time.

Today we are announcing that we now support Materialized Views in dbt. So, what does that mean?

Create dbt Documentation and Tests 10x faster with ChatGPT

July 18, 2023 · 8 min read

Pedro Brito de Sa

Product Analyst at Sage

Whether you are creating your pipelines into dbt for the first time or just adding a new model once in a while, good documentation and testing should always be a priority for you and your team. Why do we avoid it like the plague then? Because it’s a hassle having to write down each individual field, its description in layman terms and figure out what tests should be performed to ensure the data is fine and dandy. How can we make this process faster and less painful?

By now, everyone knows the wonders of the GPT models for code generation and pair programming so this shouldn’t come as a surprise. But ChatGPT really shines at inferring the context of verbosely named fields from database table schemas. So in this post I am going to help you 10x your documentation and testing speed by using ChatGPT to do most of the leg work for you.

Data Vault 2.0 with dbt Cloud

July 3, 2023 · 15 min read

Rastislav Zdechovan

Analytics Engineer at Infinite Lambda

Sean McIntyre

Senior Solutions Architect at dbt Labs

Data Vault 2.0 is a data modeling technique designed to help scale large data warehousing projects. It is a rigid, prescriptive system detailed vigorously in a book that has become the bible for this technique.

So why Data Vault? Have you experienced a data warehousing project with 50+ data sources, with 25+ data developers working on the same data platform, or data spanning 5+ years with two or more generations of source systems? If not, it might be hard to initially understand the benefits of Data Vault, and maybe Kimball modelling is better for you. But if you are in any of the situations listed, then this is the article for you!

Making dbt Cloud API calls using dbt-cloud-cli

May 3, 2022 · 13 min read

Simo Tumelius

Freelance Data and Analytics Engineer

Different from dbt Cloud CLI

This blog explains how to use the dbt-cloud-cli Python library to create a data catalog app with dbt Cloud artifacts. This is different from the dbt Cloud CLI, a tool that allows you to run dbt commands against your dbt Cloud development environment from your local command line.

dbt Cloud is a hosted service that many organizations use for their dbt deployments. Among other things, it provides an interface for creating and managing deployment jobs. When triggered (e.g., cron schedule, API trigger), the jobs generate various artifacts that contain valuable metadata related to the dbt project and the run results.

dbt Cloud provides a REST API for managing jobs, run artifacts and other dbt Cloud resources. Data/analytics engineers would often write custom scripts for issuing automated calls to the API using tools cURL or Python Requests. In some cases, the engineers would go on and copy/rewrite them between projects that need to interact with the API. Now, they have a bunch of scripts on their hands that they need to maintain and develop further if business requirements change. If only there was a dedicated tool for interacting with the dbt Cloud API that abstracts away the complexities of the API calls behind an easy-to-use interface… Oh wait, there is: the dbt-cloud-cli!

dbt + Machine Learning: What makes a great baton pass?

March 10, 2022 · 11 min read

Sung Won Chung

Solutions Architect at dbt Labs

Izzy Erekson

Solutions Architect at dbt Labs

Special Thanks: Emilie Schario, Matt Winkler

dbt has done a great job of building an elegant, common interface between data engineers, analytics engineers, and any data-y role, by uniting our work on SQL. This unification of tools and workflows creates interoperability between what would normally be distinct teams within the data organization.

I like to call this interoperability a “baton pass.” Like in a relay race, there are clear handoff points & explicit ownership at all stages of the process. But there’s one baton pass that’s still relatively painful and undefined: the handoff between machine learning (ML) engineers and analytics engineers.

In my experience, the initial collaboration workflow between ML engineering & analytics engineering starts off strong but eventually becomes muddy during the maintenance phase. This eventually leads to projects becoming unusable and forgotten.

In this article, we’ll explore a real-life baton pass between ML engineering and analytics engineering and highlighting where things went wrong.

The Exact GitHub Pull Request Template We Use at dbt Labs

November 29, 2021 · 9 min read

Jess Williams

Head of Professional Services at dbt Labs

Having a GitHub pull request template is one of the most important and frequently overlooked aspects of creating an efficient and scalable dbt-centric analytics workflow. Opening a pull request is the final step of your modeling process - a process which typically involves a lot of complex work!

For you, the dbt developer, the pull request (PR for short) serves as a final checkpoint in your modeling process, ensuring that no key elements are missing from your code or project.

The Spiritual Alignment of dbt + Airflow

November 29, 2021 · 13 min read

Sung Won Chung

Solutions Architect at dbt Labs

Airflow and dbt are often framed as either / or:

You either build SQL transformations using Airflow’s SQL database operators (like SnowflakeOperator), or develop them in a dbt project.

You either orchestrate dbt models in Airflow, or you deploy them using dbt Cloud.

In my experience, these are false dichotomies, that sound great as hot takes but don’t really help us do our jobs as data people.

New in dbt: allow Snowflake Python models to access the internet​

Cloud Data Platforms make new things possible; dbt helps you put them into production​

Introduction​

New in dbt: allow Snowflake Python models to access the internet

Cloud Data Platforms make new things possible; dbt helps you put them into production

Introduction