Getting the Top Purchases for Each Product
From time to time, any analyst will want to know the “top n instances” of something. For example, as the holidays approach, a toy store may want to know who the top customers of certain products are, so they can prepare special marketing for those customers.
As usual, Postgres makes this easy with a couple of special-purpose functions: string_agg and array_agg.
However, those of us on other databases have to do without. In particular, Amazon Redshift doesn’t yet support these functions. In this post, we’ll show you how to use window functions and self joins to find the top 5 purchasers of each product.
Counting the Purchasers
Let’s start with a simple group-and-count that gives us a table of how many times any customer purchased any product:
select
products.name as product_name,
customers.name as customer_name,
count(1) purchase_count
from products
join purchases on purchases.product_id = products.id
group by 1, 2
This gives us the simple table we were expecting:

Ranking Purchasers
Now, for each product and purchaser, we want to know that purchaser’s rank for that product. Are they the top purchaser? The second to the top?
The row_number window function makes this easy. Here’s how:
select
product_name,
customer_name,
row_number() over (
partition by product_name order by purchase_count desc
) as rank
from (
select
products.name product_name,
customers.name customer_name,
count(1) purchase_count
from products
join purchases on purchases.product_id = products.id
group by 1, 2
) purchase_counts
We’ve put our previous query into a subquery named purchase_counts.
Then, in our row_number window function, we partitioned by product_name because we want a rank per product, and we ordered by purchase_count desc so that the customer with the most purchases has rank 1.
This gives us a handy customer rank for each product, seen here:

Putting the Top 5 Purchasers into One Row
Now that we have a unique rank for each purchaser on each product, if we want the top 5 purchasers, all we have to do is join this table to itself 5 times!
Assuming we have the above results in a purchase_ranks table, here’s the query:
select
pr0.product_name,
pr1.customer_name
|| ', ' || pr2.customer_name
|| ', ' || pr3.customer_name
|| ', ' || pr4.customer_name
|| ', ' || pr5.customer_name as top_purchasing_customers
from
purchase_ranks pr0
join purchase_ranks pr1
on pr0.product_name = pr1.product_name and pr1.rank = 1
join purchase_ranks pr2
on pr0.product_name = pr2.product_name and pr2.rank = 2
join purchase_ranks pr3
on pr0.product_name = pr3.product_name and pr3.rank = 3
join purchase_ranks pr4
on pr0.product_name = pr4.product_name and pr4.rank = 4
join purchase_ranks pr5
on pr0.product_name = pr5.product_name and pr5.rank = 5
The first key for each self join is product_name. That keeps each row about one product.
The second key selects which customer we are bringing in. The first self join brings the top purchaser by requiring pr1.rank = 1. The second self join brings in the second-to-the-top purchaser by requiring pr2.rank = 2, and so on.
In the select clause, we concatenate these customers’ names together, separated by commas. Here’s the result:

Paring down the Results
As you can see, since there are 5 self joins, there are 5 resulting identical rows for each product. To bring it down to one row per product, all we need to do is wrap our query in a select distinct:
select distinct product_name, top_purchasing_customers from (
select
pr0.product_name,
pr1.customer_name
|| ', ' || pr2.customer_name
|| ', ' || pr3.customer_name
|| ', ' || pr4.customer_name
|| ', ' || pr5.customer_name as top_purchasing_customers
from
purchase_ranks pr0
join purchase_ranks pr1
on pr0.product_name = pr1.product_name and pr1.rank = 1
join purchase_ranks pr2
on pr0.product_name = pr2.product_name and pr2.rank = 2
join purchase_ranks pr3
on pr0.product_name = pr3.product_name and pr3.rank = 3
join purchase_ranks pr4
on pr0.product_name = pr4.product_name and pr4.rank = 4
join purchase_ranks pr5
on pr0.product_name = pr5.product_name and pr5.rank = 5
) products_with_top_purchasers
This gives us the final results we’re looking for!
Putting It all Together
Pulling out original purchase counts into a with clause, we get a final query that looks like this:
with purchase_ranks as (
select
product_name,
customer_name,
row_number() over (
partition by product_name order by purchase_count desc
) as rank
from (
select
products.name as product_name,
customers.name as customer_name,
count(1) as purchase_count
from products
join purchases on purchases.product_id = products.id
group by 1, 2
) purchase_counts
)
select distinct product_name, top_purchasing_customers from (
select
pr0.product_name,
pr1.customer_name
|| ', ' || pr2.customer_name
|| ', ' || pr3.customer_name
|| ', ' || pr4.customer_name
|| ', ' || pr5.customer_name as top_purchasing_customers
from
purchase_ranks pr0
join purchase_ranks pr1
on pr0.product_name = pr1.product_name and pr1.rank = 1
join purchase_ranks pr2
on pr0.product_name = pr2.product_name and pr2.rank = 2
join purchase_ranks pr3
on pr0.product_name = pr3.product_name and pr3.rank = 3
join purchase_ranks pr4
on pr0.product_name = pr4.product_name and pr4.rank = 4
join purchase_ranks pr5
on pr0.product_name = pr5.product_name and pr5.rank = 5
) products_with_top_purchasers
Now that we have the top 5 purchasers for each product, we can tailor our marketing specifically to them for the holiday season!
Sisense for Cloud Data Teams and Amazon Web Services combine to provide the fastest and easiest way to deliver scalable, high-performance, and secure cloud analytics.