Skip to main content

Add metrics to your DAG

Coming soon

The dbt Semantic Layer is undergoing some sophisticated changes, enabling more complex metric definitions and efficient querying. As part of these changes, the dbt_metrics package will be deprecated and replaced with MetricFlow. For more info, check out the The dbt Semantic Layer: what's next? and dbt_metrics deprecation blog.

Changelog

About Metrics

A metric is an aggregation over a table that supports zero or more dimensions. Some examples of metrics include:

  • active users
  • monthly recurring revenue (mrr)

In v1.0, dbt supports metric definitions as a new node type. Like exposures, metrics appear as nodes in the directed acyclic graph (DAG) and can be expressed in YAML files. Defining metrics in dbt projects encodes crucial business logic in tested, version-controlled code. Further, you can expose these metrics definitions to downstream tooling, which drives consistency and precision in metric reporting.

Review the video below to learn more about metrics, why they're important, and how to get started:

Benefits of defining metrics

Use metric specifications in downstream tools
dbt's compilation context can access metrics via the graph.metrics variable. The manifest artifact includes metrics for downstream metadata consumption.

See and select dependencies
As with Exposures, you can see everything that rolls up into a metric (dbt ls -s +metric:*), and visualize them in dbt documentation. For more information, see "The metric: selection method."

Metrics appear as pink nodes in the DAG (for now)Metrics appear as pink nodes in the DAG (for now)

Defining a metric

You can define metrics in .yml files nested under a metrics: key. Metric names must:

  • contain only letters, numbers, and underscores (no spaces or special characters)
  • begin with a letter
  • contain no more than 250 characters

For a short human-friendly name with title casing, spaces, and special characters, use the label property. More examples and guidance for how to define and structure metrics can be found here..

Example definition

models/<filename>.yml
caution
  • You cannot define metrics on ephemeral models. To define a metric, the materialization must have a representation in the data warehouse.

Available properties

Metrics can have many declared properties, which define aspects of your metric. More information on properties and configs can be found here.

Available calculation methods

Description
countThis metric type will apply the count aggregation to the specified field
count_distinctThis metric type will apply the count aggregation to the specified field, with an additional distinct statement inside the aggregation
sumThis metric type will apply the sum aggregation to the specified field
averageThis metric type will apply the average aggregation to the specified field
minThis metric type will apply the min aggregation to the specified field
maxThis metric type will apply the max aggregation to the specified field
medianThis metric type will apply the median aggregation to the specified field, or an alternative percentile_cont aggregation if median is not available

Filters

Filters should be defined as a list of dictionaries that define predicates for the metric. Filters are combined using AND clauses. For more control, users can (and should) include the complex logic in the model powering the metric.

All three properties (field, operator, value) are required for each defined filter.

Note that value must be defined as a string in YAML, because it will be compiled into queries as part of a string. If your filter's value needs to be surrounded by quotes inside the query (e.g. text or dates), use "'nested'" quotes:

    filters:
- field: is_paying
operator: 'is'
value: 'true'
- field: lifetime_value
operator: '>='
value: '100'
- field: company_name
operator: '!='
value: "'Acme, Inc'"
- field: signup_date
operator: '>='
value: "'2020-01-01'"

Calendar

The dbt_metrics package contains a basic calendar table that is created as part of your dbt run. It contains dates between 2010-01-01 and 2029-12-31.

If you want to use a custom calendar, you can replace the default with any table which meets the following requirements:

  • Contains a date_day column.
  • Contains the following columns: date_week, date_month, date_quarter, date_year, or equivalents.
  • Additional date columns need to be prefixed with date_, e.g. date_4_5_4_month for a 4-5-4 retail calendar date set. Dimensions can have any name (see following section).

To do this, set the value of the dbt_metrics_calendar_model variable in your dbt_project.yml file:

#dbt_project.yml
config-version: 2
[...]
vars:
dbt_metrics_calendar_model: my_custom_calendar

Dimensions from calendar tables

You may want to aggregate metrics by a dimension in your custom calendar table, for example is_weekend. You can include this within the list of dimensions in the macro call without it needing to be defined in the metric definition.

To do so, set a list variable at the project level called custom_calendar_dimension_list, as shown in the example below.

#dbt_project.yml
vars:
custom_calendar_dimension_list: ["is_weekend"]

Querying Your Metric

You can dynamically query metrics directly in dbt and verify them before running a job in the deployment environment. To query your defined metric, you must have the dbt_metrics package installed. Information on how to install packages can be found here.

Use the following metrics package installation code in your packages.yml file and run dbt deps to install the metrics package:

Once the package has been installed with dbt deps, make sure to run the dbt_metrics_default_calendar model as this is required for macros used to query metrics. More information on this, and additional calendar functionality, can be found in the project README.

Querying metrics with metrics.calculate

Use the metrics.calculate macro along with defined metrics to generate a SQL statement that runs the metric aggregation to return the correct metric dataset. Example below:

Supported inputs

The example above doesn't display all the potential inputs you can provide to the macro.

You may find some pieces of functionality, like secondary calculations, complicated to use. We recommend reviewing the package README for more in-depth information about each of the inputs that are not covered in the table below.

InputExampleDescriptionRequired
Required
grain'day', 'week',
'month', 'quarter',
'year'
The time grain that the metric will be aggregated to in the returned datasetOptional
dimensions['plan',
'country']
The dimensions you want the metric to be aggregated by in the returned datasetOptional
secondary_calculations[metrics.period_over_period( comparison_strategy="ratio", interval=1, alias="pop_1wk")]Performs the specified secondary calculation on the metric results. Examples include period over period calculations, rolling calculations, and period to date calculations.Optional
start_date'2022-01-01'Limits the date range of data used in the metric calculation by not querying data before this dateOptional
end_date'2022-12-31'Limits the date range of data used in the metric calculation by not querying data after this dateOptional
whereplan='paying_customer'A sql statement, or series of sql statements, that alter the final CTE in the generated sql. Most often used to limit the data to specific values of dimensions providedOptional

Secondary Calculations

Secondary calculations are window functions you can add to the metric calculation and perform on the primary metric or metrics.

You can use them to compare values to an earlier period, calculate year-to-date sums, and return rolling averages. You can add custom secondary calculations into dbt projects - for more information on this, reference the package README.

The supported Secondary Calculations are:

Period over Period:

The period over period secondary calculation performs a calculation against the metric(s) in question by either determining the difference or the ratio between two points in time. The input variable, which looks at the grain selected in the macro, determines the other point.

InputExampleDescriptionRequired
comparison_strategyratio or differenceHow to calculate the delta between the two periodsYes
interval1Integer - the number of time grains to look backYes
aliasweek_over_weekThe column alias for the resulting calculationNo
metric_listbase_sum_metricList of metrics that the secondary calculation should be applied to. Default is all metrics selectedNo

Period to Date:

The period to date secondary calculation performs an aggregation on a defined period of time that is equal to or higher than the grain selected. For example, when you want to display a month_to_date value alongside your weekly grained metric.

InputExampleDescriptionRequired
aggregatemax, averageThe aggregation to use in the window function. Options vary based on the primary aggregation and are enforced in validate_aggregate_coherence().Yes
period"day", "week"The time grain to aggregate to. One of ["day", "week", "month", "quarter", "year"]. Must be at equal or coarser (higher, more aggregated) granularity than the metric's grain (see Time Grains below). In example grain of month, the acceptable periods would be month, quarter, or year.Yes
aliasmonth_to_dateThe column alias for the resulting calculationNo
metric_listbase_sum_metricList of metrics that the secondary calculation should be applied to. Default is all metrics selectedNo

Rolling:

Developing metrics with metrics.develop

Multiple/Derived Metrics with metrics.develop

If you have a more complicated use case that you are interested in testing, the develop macro also supports this behavior. The only caveat is that you must include the raw tags for any provided metric yml that contains a derived metric. Example below:

{% set my_metric_yml -%}
{% raw %}

metrics:
- name: develop_metric
model: ref('fact_orders')
label: Total Discount ($)
timestamp: order_date
time_grains: [day, week, month]
calculation_method: average
expression: discount_total
dimensions:
- had_discount
- order_country

- name: derived_metric
label: Total Discount ($)
timestamp: order_date
time_grains: [day, week, month]
calculation_method: derived
expression: "{{ metric('develop_metric') }} - 1 "
dimensions:
- had_discount
- order_country

- name: some_other_metric_not_using
label: Total Discount ($)
timestamp: order_date
time_grains: [day, week, month]
calculation_method: derived
expression: "{{ metric('derived_metric') }} - 1 "
dimensions:
- had_discount
- order_country

{% endraw %}
{%- endset %}

select *
from {{ metrics.develop(
develop_yml=my_metric_yml,
metric_list=['derived_metric']
grain='month'
)
}}

The above example will return a dataset that contains the metric provided in the metric list (derived_metric) and the parent metric (develop_metric). It will not contain some_other_metric_not_using as it is not designated in the metric list or a parent of the metrics included.

Important caveat - You must wrap the expression property for derived metrics in double quotes to render it. For example, expression: "{{ metric('develop_metric') }} - 1 ".

Questions from the Community

0