Once in a while, you stumble upon a nice feature of SQL Server that you knew existed, but never really explored. Despite it being in SQL Server since 2005. Let’s do this today with one of the most powerful features in Transact-SQL (T-SQL) ROW_NUMBER(). This function is designed to “assign a unique sequential integer to each record within a particular partition of a query.” This post explains this in more detail.
What is ROW_NUMBER()?
ROW_NUMBER()
is a window function in T-SQL that assigns a unique integer to each record in a result set, based on an order your specified for your result set. The syntax looks like this:
ROW_NUMBER() OVER (PARTITION BY [Criterion] ORDER BY [OrderingCriterion])
Why Use ROW_NUMBER()?
- Specific Grouping: By combining PARTITION BY, records can be divided into logical groups, which is particularly useful when retrieving the latest or most relevant data for each group.
- Ordered Results: ORDER BY within the OVER clause ensures that records within each group are ordered according to your specifications.
- Precision and Performance: When you need the latest record per group, ROW_NUMBER() provides an efficient solution compared to other approaches like self-written sub-queries or more complex JOIN operations.
Scenario: Sales Leaderboard with Rank by Latest Sale
Imagine we have a Sales
table with the following columns: SalespersonId
, SaleAmount
, SaleDate
. Our goal is to create a leaderboard showing each salesperson’s rank based on their latest sale amount.
Using MAX()
:
To achieve this using MAX()
, we would need to write a complex query that first finds the latest sale for each salesperson and then somehow calculates the rank:
-- Find the latest sale for each salesperson
WITH LatestSales AS (
SELECT SalespersonId, MAX(SaleDate) AS LatestSaleDate
FROM Sales
GROUP BY SalespersonId
)
-- Join to get the sale amount for the latest sale
, LatestSalesAmounts AS (
SELECT s.SalespersonId, s.SaleAmount
FROM Sales s
JOIN LatestSales ls ON s.SalespersonId = ls.SalespersonId AND s.SaleDate = ls.LatestSaleDate
)
-- Now, you would have to calculate the rank, which isn't straightforward
This approach becomes cumbersome and less efficient because we need to join back to the original table to get the SaleAmount
for the latest sale, and we haven’t even calculated the ranks yet.
Using ROW_NUMBER()
:
Now, let’s see how ROW_NUMBER()
can simplify this task and potentially be faster:
WITH RankedSales AS (
SELECT
SalespersonId,
SaleAmount,
SaleDate,
ROW_NUMBER() OVER (PARTITION BY SalespersonId ORDER BY SaleDate DESC) AS rn
FROM
Sales
)
SELECT
SalespersonId,
SaleAmount
FROM
RankedSales
WHERE
rn = 1
ORDER BY
SaleAmount DESC;
In this query, ROW_NUMBER()
does all the heavy lifting in a single sweep:
- It partitions the data by each salesperson.
- Orders their sales by date descending, so the latest sale gets the row number 1.
- By selecting rows where
rn = 1
, we are effectively getting the latest sale for each salesperson.
We then order the final result by SaleAmount
to get the leaderboard.
Benefits of Using ROW_NUMBER()
:
- Elegance: The
ROW_NUMBER()
function provides a neater solution that is easier to read and understand. - Performance: Because
ROW_NUMBER()
can leverage indexing especially if there’s an index on theSaleDate
column, it might be faster as it avoids additional joins and subqueries. - Flexibility: If we later decide we want the top 3 latest sales per salesperson, we can simply change the
WHERE rn = 1
filter toWHERE rn <= 3
.
In scenarios where we need more than just the aggregate of a set—such as individual row details or when we want to perform ranking, paging, or retrieve a subset of rows within a group—ROW_NUMBER()
often proves to be the superior choice. It provides a balance between power and performance that aggregate functions alone cannot offer.