I, Sung, entered the data industry by chance in Fall 2014. I was using this thing called audit command language (ACL) to automate debits equal credits for accounting analytics (yes, it’s as tedious as it sounds). I remember working my butt off in a hotel room in Des Moines, Iowa where the most interesting thing there was a Panda Express. It was late in the AM. I’m thinking about 2 am. And I took a step back and thought to myself, “Why am I working so hard for something that I just don’t care about with tools that hurt more than help?”
Why we're deprecating the dbt_metrics package
Hello, my dear data people.
If you haven’t read Nick & Roxi’s blog post about what’s coming in the future of the dbt Semantic Layer, I highly recommend you read through that, as it gives helpful context around what the future holds.
With that said, it has come time for us to bid adieu to our beloved dbt_metrics package. Upon the release of dbt-core v1.6 in late July, we will be deprecating support for the dbt_metrics package.
Stronger together: Python, dataframes, and SQL
For years working in data and analytics engineering roles, I treasured the daily camaraderie sharing a small office space with talented folks using a range of tools - from analysts using SQL and Excel to data scientists working in Python. I always sensed that there was so much we could work on in collaboration with each other - but siloed data and tooling made this much more difficult. The diversity of our tools and languages made the potential for collaboration all the more interesting, since we could have folks with different areas of expertise each bringing their unique spin to the project. But logistically, it just couldn’t be done in a scalable way.
So I couldn’t be more excited about dbt’s polyglot capabilities arriving in dbt Core 1.3. This release brings Python dataframe libraries that are crucial to data scientists and enables general-purpose Python but still uses a shared database for reading and writing data sets. Analytics engineers and data scientists are stronger together, and I can’t wait to work side-by-side in the same repo with all my data scientist friends.
Going polyglot is a major next step in the journey of dbt Core. While it expands possibilities, we also recognize the potential for confusion. When combined in an intentional manner, SQL, dataframes, and Python are also stronger together. Polyglot dbt allows informed practitioners to choose the language that best fits your use case.
In this post, we’ll give you your hands-on experience and seed your imagination with potential applications. We’ll walk you through a demo that showcases string parsing - one simple way that Python can be folded into a dbt project.
We’ll also give you the intellectual resources to compare/contrast:
- different dataframe implementations within different data platforms
- dataframes vs. SQL
Finally, we’ll share “gotchas” and best practices we’ve learned so far and invite you to participate in discovering the answers to outstanding questions we are still curious about ourselves.
Based on our early experiences, we recommend that you:
✅ Do: Use Python when it is better suited for the job – model training, using predictive models, matrix operations, exploratory data analysis (EDA), Python packages that can assist with complex transformations, and select other cases where Python is a more natural fit for the problem you are trying to solve.
❌ Don’t: Use Python where the solution in SQL is just as direct. Although a pure Python dbt project is possible, we’d expect the most impactful projects to be a mixture of SQL and Python.
How to design and structure dbt metrics: Recommendations for getting started
IMPORTANT: This document serves as the temporary location for information on how to design and structure your metrics. It is our intention to take this content and turn it into a Guide, like How we structure our dbt projects, but we feel that codifying information in a Guide first requires that metrics be rigorously tested by the community so that best practices can arise. This document contains our early attempts to create best practices. In other words, read these as suggestions for a new paradigm and share in the community where they do (or don’t) match your experiences! You can find more information on where to do this at the end.
The power of a semantic layer on top of a mature data modeling framework
As a longtime dbt Community member, I knew I had to get involved when I first saw the dbt Semantic Layer in the now infamous
dbt should know about metrics Github Issue. It gave me a vision of a world where metrics and business logic were unified across an entire organization; a world where the data team was no longer bound to a single consuming experience and could enable their stakeholders in dozens of different ways. To me, it felt like the opportunity to contribute to the next step of what dbt could become.
In past roles, I’ve been referred to as the
dbt zealot and I’ll gladly own that title! It’s not a surprise - dbt was built to serve data practitioners expand the power of our work with software engineering principles. It gave us flexibility and power to serve our organizations. But I always wondered if there were more folks who could directly benefit from interacting with dbt.
The Semantic Layer expands the reach of dbt by coupling dbt’s mature data modeling framework with semantic definitions. The result is a first of its kind data experience that serves both the data practitioners writing your analytics code and stakeholders who depend on it. Metrics are the first step towards this vision, allowing users to version control and centrally define their key business metrics in a single repo while also serving them to the entire business.
However, this is still a relatively new part of the dbt toolbox and you probably have a lot of questions on how exactly you can do that. This blog contains our early best practice recommendations for metrics in two key areas:
- Design: What logic goes into metrics and how to use calculations, filters, dimensions, etc.
- Structure: Where these metrics will live in your dbt project and how to compose the files that contain your metrics
We developed these recommendations by combining the overall philosophy of dbt, with our hands-on learning gathered during the beta period and internal testing.
Understanding the components of the dbt Semantic Layer
TLDR: The Semantic Layer is made up of a combination of open-source and SaaS offerings and is going to change how your team defines and consumes metrics.
At last year's Coalesce, Drew showed us the future1 - a vision of what metrics in dbt could look like. Since then, we've been getting the infrastructure in place to make that vision a reality. We wanted to share with you where we are today and how it fits into the broader picture of where we're going.
To those who haven't followed this saga with the intensity of someone watching their investments on the crypto market, we're rolling out this new resource to help you better understand the dbt Semantic Layer and provide clarification on the following things:
- What is the dbt Semantic Layer?
- How do I use it?
- What is publicly available now?
- What is still in development?
With that, lets get into it!