Keeping your Redshift fast requires maintenance. As your query patterns change you’ll want to update the dist and sort keys to improve performance.
And if you’re appending to or updating a table, you need to vacuum it every so often to resort the data and remove dead space.
Smaller is Easier
All of this is easy on small tables. vacuums are fast and changing keys is as easy as creating a new table and using select into to transfer the data.
But this doesn’t work on large tables. Either there simply isn’t enough extra space to duplicate the data, or the vacuum locks the table for far too long.
Here’s an issue that a client came up against: They had an events table that was far too large to work with as a single table. So we broke it up into many smaller tables and wrapped them in a view.
Break up Your Table
First, create the smaller tables (we’re using date ranges since it’s an events table):
create table events_20140101_20140201 (like events)
And insert the relevant data into the sub-table:
insert into events_20140601_20140701 ( select * from events where event_time between '2014-06-01' and '2014-07-01' )
Delete the data from events once it’s been transferred and repeat the process until events is a manageable size.
Now that we have many smaller tables, rename events to events_current and only add new data to events_current. When events_current gets large, archive data from it to a dated events table.
Create a View and Merge Your Sub-tables
Finally, create events as a view using union all to merge the sub-tables:
create or replace view events as ( select * from events_current union all select * from events_20140601_20140701 union all select * from events_20140701_20140801 union all select * from events_20140901_20141001 union all select * from events_20141001_20141101 union all select * from events_20141101_20141201 )
Queries working with events will work just like before, and the smaller tables make the maintenance much easier since you can work with them one at a time.
Changing dist and sort keys is back to being a simple select into, and you never need to vacuum the dated sub-tables if they never get changed.