SQL ROW_NUMBER
In this page, let’s go deep into the ROW_NUMBER function and talk about what it is, how to use it, and why it’s important in analytics engineering work.
The ROW_NUMBER window function is an effective way to create a ranked column or filter a query based on rankings. More specifically, the ROW_NUMBER function returns the unique row number of a row in an ordered group or dataset.
Unlike the RANK and DENSE_RANK functions, ROW_NUMBER is non-deterministic, meaning that a unique number is assigned arbitrarily for rows with duplicate values.
How to use the ROW_NUMBER function
The ROW_NUMBER function has a pretty simple syntax, with an optional partition field and support for ordering customization:
row_number() over ([partition by <field(s)>] order by field(s) [asc | desc])
Some notes on this function’s syntax:
- The
partition by
field is optional; if you want to get the row numbers of your entire dataset (compared to grabbing row number within a group of rows in your dataset), you would simply omit thepartition by
from the function call (see the example below for this). - By default, the ordering of a ROW_NUMBER function is set to ascending. To explicitly make the resulting order descending, you’ll need to pass in
desc
to theorder by
part of the function.
Let’s take a look at a practical example using the ROW_NUMBER function below.
ROW_NUMBER function example
select
customer_id,
order_id,
order_date,
row_number() over (partition by customer_id order by order_date) as row_n
from {{ ref('orders') }}
order by 1
This simple query using the Jaffle Shop’s orders
table will return the unique row number per customer by their order_date
:
customer_id | order_id | order_date | row_n |
---|---|---|---|
1 | 1 | 2018-01-01 | 1 |
1 | 37 | 2018-02-10 | 2 |
2 | 8 | 2018-01-11 | 1 |
3 | 2 | 2018-01-02 | 1 |
3 | 24 | 2018-01-27 | 2 |
3 | 69 | 2018-03-11 | 3 |
Because ROW_NUMBER is non-deterministic, orders per customer that have the same order_date
would have unique row_n
values (unlike if you used the RANK or DENSE_RANK functions).
ROW_NUMBER syntax in Snowflake, Databricks, BigQuery, and Redshift
Most, if not all, modern data warehouses support ROW_NUMBER and other similar ranking functions; the syntax is also the same across them. Use the table below to read more on the documentation for the ROW_NUMBER function in your data warehouse.
Data warehouse | ROW_NUMBER support? |
---|---|
Snowflake | ✅ |
Databricks | ✅ |
Amazon Redshift | ✅ |
Google BigQuery | ✅ |
ROW_NUMBER function use cases
We most commonly see the ROW_NUMBER function used in data work to:
- In SELECT statements to add explicit and unique row numbers in a group of data or across an entire table
- Paired with QUALIFY statement, filter CTEs, queries, or models to capture one unique row per specified partition with the ROW_NUMBER function. This is particularly useful when you need to remove duplicate rows from a dataset (but use this wisely!).
This isn’t an extensive list of where your team may be using the ROW_NUMBER function throughout your dbt models, but contains some common scenarios analytics engineers face day-to-day.