In this post, we will cover some of the more common MySQL functions that differ in syntax from Redshift, as well as various rules and tricks to keep in mind!
General Differences Between MySQL And Redshift
One of the most common pitfalls when converting MySQL syntax to Redshift involves the group by requirements. Redshift is more stringent, and requires that all non-aggregate functions in the select statement must be included in the group by clause. As an example, MySQL syntax allows for a query like so:
select [created_at:week], country, count(1) from orders group by 1
This same query would give an error in Redshift, as it would require the country column to be included in the group by clause.
Unlike MySQL, Redshift does not allow you to define variables within a query. To compensate for this, Redshift includes Window Functions that allow you to iterate over your data in a similar manner to how a variable would be used in MySQL. Let’s take a look at a query that calculates the cumulative sum of a column.
-- MySQL: Add Cumulative Column set @iterating_variable := 0; select created_at , number_of_orders , (@iterating_variable := @iterating_variable + number_of_orders) from orders order by created_at
To accomplish the same task in Redshift, we can use the sum() window function:
-- Redshift: Add Cumulative Column select created_at , number_of_orders , sum(number_of_orders) over (order by id rows unbounded preceding) from orders order by Created_at
Now both of these queries display the number of orders by day as well as the running cumulative total of orders!
One downside to MySQL is its reliance upon subqueries. If you’re looking to use multiple subqueries/CTE’s in your main query, it can quickly get overwhelming. Redshift allows you to use With Clauses to build temporary tables that only exist within the query.
While MySQL also has temporary tables, they can only be called once within a query. Redshift’s temporary tables, created through these with clauses, can be referenced multiple times in the query!
Neither MySQL nor Redshift has a built-in function to generate a series of dates or values. However, they each have a couple of clever options to imitate this behavior. In fact, we already have a great blog post that details these methods in greater detail!
Calling Names with Spaces or Reserved Words
In terms of differences between the two SQL Types, this is actually one of the more straightforward cases. MySQL makes use of backticks to “wrap” these names, while Redshift double-quotes them. An example can be seen below where we call two tables: one with a space in its name, and one that is a reserved word.
-- MySQL select * from `Table One`, `Order` -- Redshift select * from “Table One”, “Order”
MySQL’s concat() function lets you pass in multiple strings to concatenate together. Redshift’s concat() function only allows you to pass in two strings, so you would have to nest this function in order to concatenate more than two values.
-- MySQL select concat(`Sisense ’, ‘is ’, ‘great’) -- Redshift select concat(‘Sisense ’, concat(‘is ’, ‘great’))
Redshift also has a shortcut for concatenation, using double-pipe notation in place of a function call:
select ‘Sisense ’ || ‘is ’ || ‘great’
Date/Time Specific Functions
In some of the later versions of Redshift, now() is a deprecated function. You would want to use getdate() or sysdate() to return the current time based on the timezone of your database.
From_unixtime() and Unix_timestamp()
Redshift does not have a default function equivalent to from_unixtime() that converts unix timestamps into date timestamps. Instead, we can employ some clever math shown in this Valkrysa blog post:
select timestamp ‘epoch’ + your_time_colum * interval ‘1 second’
The first part of this select statement grabs the date and timestamp that acts as the threshold for unix time (1970-01-01 00:00:00). Since unix time measures the amount of seconds that have elapsed since this date, we just need to convert it into ‘second’ literals. Now, we have our timestamp of 1970-01-01 00:00:00, and we can add the total number of seconds to get the appropriate date.
Converting from a timestamp value into unix time is simpler, and actually has two functions to support this. You can do:
select extract(epoch from your_time_column)
select date_part(epoch, time_column)
In each of the two functions, Redshift is essentially calculating the number of seconds that have elapsed from 1970-01-01 00:00:00 and your timestamp column.
In many ways, Redshift and MySQL are on the same page in terms of using interval literals with timestamps. The major difference surrounds the pickiness of syntax, as well as whether or not the Redshift instance is running on a lead node. In Redshift, best practice recommends single quoting your literal value. This would look similar to below:
select getdate() - interval ‘1 day’
For queries that do not run on the lead node, Redshift does not allow for interval values higher than a week. So attempting to add or subtract a ‘month’ or ‘year’ interval value can throw an error if the timestamp column is evaluated across multiple nodes.
Date_sub() and Date_add()
These functions act similarly to the interval literals discussed above. Redshift combines both of these functions into a single dateadd() function. You’ll notice that Redshift’s version calls for three parameters instead of two.
In order to get the same behavior of MySQL’s date_sub(), you would want to pass in a negative interval to the dateadd() function. That means your queries may look something like this:
select dateadd(day, 1, getdate()) -- Returns Same Time Tomorrowselect dateadd(day, -1, getdate()) -- Returns Same Time Yesterday
This is an important function in MySQL for getting timestamps to display in the exact way that you would like. Redshift has two similar functions depending on your use-case. to_char() takes a timestamp, and allows you to pass in a parameter that control the formatting. The to_date() function takes a string or numeric value, and uses the same format options as to_char() to control the end result.
Let’s take a look at some of the different formats we can return! Running the following query:
select getdate() as format_one, to_char(getdate(), ‘MON-DD-YYYY’) as month_day_year, to_char(getdate(), ‘Day, Month DD YYYY’) as day_month_year, to_char(getdate(), ‘YYYY-Q') as year_quarter
Returns the following values:
Redshift’s datediff() function is more robust and flexible than MySQL’s in terms of the level of specificity. MySQL’s datediff() is limited to returning a whole number value of days between two dates. Redshift’s datediff() allows you to choose what format to calculate the difference in (e.g. minutes, hours, days, weeks).
One other potential misstep to watch out for is the order of parameters. MySQL’s version of the function takes two parameters, where a positive value is returned if the first parameter is larger than the second. Redshift reverses the order of the parameters if you want to return a positive value. That means your queries would run like this:
-- MySQL: returns 1 select datediff(now(), now() - interval 1 day) -- Redshift: returns 1 select datediff(day, getdate() - interval ‘1 day’, getdate()) Day(), Week(), Month(), etc.
Redshift uses the extract() function to pull out the desired numeric value of your timestamps. Rather than having a specific function for each date interval, extract() allows you to pass in the exact interval you want to find.
-- MySQL select day(now()), week(now()), month(now()) -- Redshift select extract(day from getdate()) , extract(week from getdate()) , extract(month from getdate())
While it’s not quite as similar as Postgres, you can see that MySQL does share many similarities with Redshift! For the majority of these functions, the key difference is something as small an additional parameter, the ordering of parameters, or even just a different name.