Sliding Date Ranges
One fine morning, your investor asks you a question: How many charts are there across all of Sisense for Cloud Data Teams? He also wants to know how fast that number is growing.
We’ll start simply, by pulling the number of daily new charts:
select date(created_at), count(1) from charts group by 1
The results look straightforward:
Now let’s answer the question of how many charts have ever been created as of a given date:
select d, count(charts.id) from generate_series( current_date - interval '12 months', current_date, '1 day' ) d left join charts on charts.created_at <= d group by 1
We start with our old friend generate_series to get a continuous list of dates. (You can also join a chart to itself as long as there’s at least one widget for each day.)
The range join comes into play when we join on charts.created_at <= d. For each date d, any row in charts with created_at <= d will be included in the join. This means any given row in charts will be included once for each date after its creation date. Most rows in charts will be included many times!
Essentially, we’ve made our previous graph cumulative.
Excluding Deleted Charts
Unfortunately for our simple analysis, users can delete charts. In fact, they do it all the time. That makes this graph pretty misleading.
Fixing this should be simple, right? Just filter out the deleted charts!
select d, count(charts.id) from generate_series( current_date - interval '12 months', current_date, '1 day' ) d left join charts on charts.created_at <= d where charts.deleted_at is null group by 1
Sure enough, we see a lot fewer charts:
Don’t Rewrite History
There’s one more wrinkle: That graph changes history! If I make a chart in April and delete it in June, that chart is retroactively gone from the April bucket. This is not ideal. That chart could be present in April and May, but not in June. Once April is over, the April number should never change.
Rather than filtering all deleted charts after the join, we’ll use the deleted_at timestamp in the join itself:
select d, count(charts.id) from generate_series( current_date - interval '12 months', current_date, '1 day' ) d left join charts on charts.created_at <= d and ( charts.deleted_at is null or charts.deleted_at > d ) group by 1
In our join, we specify that for each date, we want to include charts that were created before that date and deleted after that date (or never deleted at all).
This looks much better. Using range joins, we can always get an accurate picture of any previous point in time.