User-defined functions Beta
User-defined functions (UDFs) enable you to define and register custom functions in your warehouse. Like macros, UDFs promote code reuse, but they are objects in the warehouse so you can reuse the same logic in tools outside dbt, such as BI tools, data science notebooks, and more.
UDFs are particularly valuable for sharing logic across multiple tools, standardizing complex business calculations, improving performance for compute-intensive operations (since they're compiled and optimized by your warehouse's query engine), and version controlling custom logic within your dbt project.
dbt creates, updates, and renames UDFs as part of DAG execution. The UDF is built in the warehouse before the model that references it.
Prerequisites
-
Make sure you're using dbt platform's Latest Fusion or Latest release track or dbt Core v1.11.
-
Use one of the following adapters:
- dbt Core
- dbt Fusion engine
- BigQuery
- Snowflake
- Redshift
- Postgres
- Databricks
- BigQuery
- Snowflake
- Redshift
- Databricks
Defining UDFs in dbt
To define UDFs in dbt, refer to the following steps:
-
Create a SQL file under the
functions
directory. For example:functions/is_positive_int.sql
REGEXP_CONTAINS(a_string, r'^[0-9]+$')Note: You can specify configs in the SQL file or in the corresponding YAML file in Step 2.
-
Specify the function name and define the config, properties, return type, and optional arguments in a corresponding YAML file. For example:
functions/schema.ymlfunctions:
- name: is_positive_int # required
description: My UDF that determines if a string represents a positive (+) integer # required
config:
schema: udf_schema
database: udf_db
arguments: # optional
- name: a_string # required if arguments is specified
data_type: string # required if arguments is specified
description: The string that I want to check if it's representing a positive integer (like "10")
returns: # required
data_type: boolean # requiredThe rendered
CREATE
UDF statement depends on which adapter you’re using. For example:- Snowflake
- Redshift
- BigQuery
- Databricks
- Postgres
CREATE OR REPLACE FUNCTION udf_db.udf_schema.is_positive_int(a_string STRING)
RETURNS BOOLEAN
LANGUAGE SQL
AS $$
REGEXP_CONTAINS(a_string, r'^[0-9]+$')
$$;CREATE OR REPLACE FUNCTION udf_db.udf_schema.is_positive_int(VARCHAR)
RETURNS BOOLEAN
VOLATILE # Technically this function could be set as STABLE, but we don't support setting volatility yet
AS $$
REGEXP_CONTAINS($1, r'^[0-9]+$')
$$ LANGUAGE SQL;CREATE OR REPLACE FUNCTION udf_db.udf_schema.is_positive_int(a_string STRING)
RETURNS BOOLEAN
AS (
REGEXP_CONTAINS(a_string, r'^[0-9]+$')
);CREATE OR REPLACE FUNCTION udf_db.udf_schema.is_positive_int(a_string STRING)
RETURNS BOOLEAN
LANGUAGE SQL
RETURN
REGEXP_CONTAINS(a_string, r'^[0-9]+$');CREATE OR REPLACE FUNCTION udf_db.udf_schema.is_positive_int(a_string STRING)
RETURNS BOOLEAN
AS $$
REGEXP_CONTAINS(a_string, r'^[0-9]+$')
$$ LANGUAGE SQL; -
Reference the UDF in a model using the
{{ function(...) }}
macro. For example:models/my_model.sqlselect
maybe_positive_int_column,
{{ function('is_positive_int') }}(maybe_positive_int_column)
from {{ ref('a_model_i_like') }} -
Run
dbt compile
. In the following example, the{{ function('is_positive_int') }}
is replaced by the UDF nameudf_db.udf_schema.is_positive_int
.models/my_model.sqlselect
maybe_positive_int_column,
udf_db.udf_schema.is_positive_int(maybe_positive_int_column) as is_positive
from analytics.dbt_schema.a_model_i_likeIn your DAG, a UDF node is created from the SQL and YAML definitions, and there will be a dependency between
is_positive_int
→my_model
.
After defining a UDF, if you update the SQL file that contains its function body (is_positive_int.sql
in this example) or its configurations, your changes will be applied to the UDF in the warehouse next time you build
.
Using UDFs in unit tests
You can use unit tests to validate models that reference UDFs. Before running unit tests, make sure the function exists in your warehouse. To ensure that the function exists for a unit test, run:
dbt build --select "+my_model_to_test" --empty
Following the example in Defining UDFs in dbt, here's an example of a unit test that validates a model that calls a UDF:
unit_tests:
- name: test_is_positive_int
description: "Check my is_positive_int logic captures edge cases"
model: my_model
given:
- input: ref('a_model_i_like')
rows:
- { maybe_positive_int_column: 10 }
- { maybe_positive_int_column: -4 }
- { maybe_positive_int_column: +8 }
- { maybe_positive_int_column: 1.0 }
expect:
rows:
- { maybe_positive_int_column: 10, is_positive: true }
- { maybe_positive_int_column: -4, is_positive: false }
- { maybe_positive_int_column: +8, is_positive: true }
- { maybe_positive_int_column: 1.0, is_positive: true }
Listing and selecting UDFs
To list UDFs in your project, run dbt list --select "resource_type:function"
or dbt list --resource-type function
.
To select UDFs when building a project, run dbt build --select "resource_type:function"
.
For more information about selecting UDFs, see the examples in Node selector methods.
Limitations
- Creating UDFs in other languages (for example, Python, Java, or Scala) is not yet supported.
- Only scalar functions are currently supported.
Related FAQs
Was this page helpful?
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.