How many widgets did I sell?

When you’re just starting out, chances are you’ll be asking a very simple question: How many widgets did I sell each day?

Your SQL might look something like this:

select date(created_at) as day, count(1)
from sales
where created_at > now() - interval '30 day'
group by day order by day;

With that SQL, you can expect results like these:

daycount
 2014-03-12 5
 2014-03-13 1
2014-03-19 6
 2014-03-20 4
 2014-03-21 1
2014-03-24
 2014-04-08 2
 2014-04-09 3

If you viewed the results in a data visualization tool, you’d get a graph like this:

Daily sales chart

Unfortunately, since you’re just starting out, you’re not selling widgets every day. And notice the zeroes aren’t even showing up! Your investors will not be pleased with such a misleading graph.

If you’re using Postgres, generate_series can help. generate_series produces a table with a given first point, last point, and interval that you specify. In this case, we’ll choose 30 days ago, today, and each day in between. Here’s our revised SQL:

select date(d) as day, count(sales.id) 
from generate_series(
  current_date - interval '30 day', 
  current_date, 
  '1 day'
) d 
left join sales on date(sales.created_at) = d 
group by day order by day;

By building an explicit list of dates in our query, rather than relying on the dates, in the sales table, we made sure every date showed up in our result set.

daycount
 2014-03-12 5
 2014-03-13 1
 2014-03-14 0
 2014-03-15 0
 2014-03-160
 2014-03-170
 2014-03-180
 2014-03-196
 2014-03-204
 2014-03-21 1
 2014-03-22 0
 2014-03-23 0
 2014-03-24 1
 2014-03-25 0
 2014-03-260
 2014-03-27 0
 2014-03-28 0
 2014-03-29 0
 2014-03-30 0
 2014-03-31 0
 2014-03-01 0
 2014-03-02 0
 2014-03-03 0
 2014-03-040
 2014-03-050
 2014-03-06 0
 2014-03-070
 2014-03-080
 2014-03-09 0
 2014-03-100
 2014-03-11 2

And here’s the chart:

Daily sales chart 2

That’s more like it!

Caveats and gotchas

left join: We used generate_series to make sure we got a row for every day in the last 30 days, not just a row for every day in the sales table. So be careful not to use an inner join, which will wipe out the days on the left-hand side that have no matching day on the right-hand side.

count(sales.id): A snippet from our joined table looks like this:

2014-03-21 00:00:00 1178 [email protected] Dorian
Nakamoto 4096 2014-32-21 22:19:03.36718
2014-03-22 00:00:00
2014-02-23 00:00:00
2014-03-24 00:00:00  1179 h[email protected] Harry 
Glaser 500

As intended, there are blank rows for days where we had no sales. As a result, we can’t just count every row to get number of sales. count(sales.id) will count just the rows with non-null sales IDs, which is what we want.

Other databases

Not on Postgres? Here are some generate_series equivalents that are as good or better:

Oracle’s magic dual table can help:

select sysdate - level from dual connect by level <= 30;

SQL Server’s recursive with statements are helpful and very impressive in general:

with t as (
  select 0 as d
  union all
  select d + 1
  from t
  where d < 30
) select getdate() - d from t

MySQL, unfortunately, is trickier. For some clever hacks, start with this answer on StackOverflow.

An easier way

Of course, a visualization tool like Sisense for Cloud Data Teams will continuous-ify the data for you! In our tool of choice, head to a chart’s settings and toggle “Continuous Axis” for the desired effect.

Tags: