A Deceptively Simple Query
Front and center on my daily dashboard is a question I ask all the time: How much usage has there been today?
To answer it, I’ve written a seemingly simple query:
select sum(seconds) from time_on_site_logs where created_at - interval '8 hour' > date(now() - interval '8 hour')
Notice the “- interval ‘8 hour’” operations in the where clause. Times are stored in UTC, but we want “today” to mean today PST.
Because the time_on_site_logs table has grown to 22M rows, even with an index on created_at, this query has slowed way down to an average of 267 seconds!
Ignoring the Index
As usual, running explain will tell us why it’s slow: It turns out our database is ignoring the index and doing a sequential scan!
Thanks again to pgAdminIII for the explain graphics.
The problem is that we’re doing math on our indexed column, created_at. This causes the database to look at each row in the table, compute created_at – interval ‘8 hour’ for that row, and compare the result to date(now() – interval ‘8 hour’).
Moving Math To The RHS
Our goal is to compute one value in advance, and let the database search the created_at index for that value. To do that, we can just move all the math to the right-hand side of the comparison:
select sum(seconds) from time_on_site_logs where created_at > date(now() - interval '8 hour') + interval '8 hour'
This query runs in a blazing 85ms. With a simple change, we’ve achieved a 3,000X speedup!
As promised, with a single value computed in advance, the database can search its index:
Of course, sometimes the math can’t be refactored quite so easily. But when writing queries with restricts on indexed columns, avoid math on the indexed column. You can get some pretty massive speedups.