Selecting the First Row for each Group

Sometimes you just want to graph the winners. Were there more iOS or Android users today? Grouping and counting the daily usage per platform is easy, but getting only the top platform for each day can be tough.

Unlike joining only the first row, primary and foreign keys won’t be of much use, so we’ll need a different approach to make a chart like this:

Top platform by day chart

Let’s start with daily counts per platform from our gameplays table:

select date(created_at) dt, platform, count(1) ct
from gameplays
group by 1, 2

This gives us a familiar table, with one date spanning multiple rows:

dt platform ct
2014-06-30iOS 49751
2014-06-30 android 80781
2014-06-29iOS 158909
2014-06-29android 91380
2014-06-28iOS 108206
2014-06-28 android 95363
2014-06-27iOS 105756
2014-06-27android 92316

We want a table with one row per date with the highest count, like this:

dtplatform ct
2014-06-30android 80781
2014-06-29iOS  158909
2014-06-28iOS  108206
2014-06-27iOS  105756

Postgres and Redshift

As usual on Postgres and Redshift, window functions make this an easy task. We’ll use the row_number() function partitioned by date in an inner query, and then filter to row_num = 1 in the outer query to get just the first record per group.

This function numbers each of the rows:

row_number() over (partition by dt order by ct desc) row_num

We’ll plug it into an inner query, like so, to get the desired results:

select dt, platform, ct 
from (
  select
    date(created_at) dt, 
    platform, 
    count(1) ct,
    row_number() over
      (partition by dt order by ct desc) row_num
  from gameplays
  group by 1, 2
) t
where row_num = 1

MySQL

Since MySQL doesn’t have window functions, we’ll do something similar using group_concat. With group_concat we can roll up the platform column into a comma-separated string per-date, ordered by its count:

group_concat(platform order by ct desc) platform

That’ll give us all the platforms, with the highest-performing platform first.

The complete query looks like this:

select 
  dt, 
  group_concat(platform order by ct desc) platform
from (
  select date(created_at) dt, platform, count(1) ct
  from gameplays
  group by 1, 2
) t
group by 1

Which gives us results like this:

dtplatform
2014-06-30android, iOS
2014-06-29 iOS, android
2014-06-28 iOS, android
2014-06-27iOS, android
2014-06-26

android, iOS
2014-06-25iOS, android

Perfect! The platforms with the highest counts are first in the list. Using substring_index — which grabs the first word before the comma — we can extract out only the first platform:

substring_index(
  group_concat(
    platform order by ct desc
  )
, ',',1) platform,

Once we know the highest-performing platform each day, we can use max(ct) to get the count associated with that platform.

The complete query:

select 
  dt, 
  substring_index(
    group_concat(
      platform order by ct desc
    ), ',',1
  ) platform,
  max(ct) ct
from (
  select date(created_at) dt, platform, count(1) ct
  from gameplays
  group by 1, 2
) t
group by 1

And that’s all there is to it!

Tags: