SQL is one of the analyst’s most powerful tools. In SQL Superstar, we give you actionable advice to help you get the most out of this versatile language and create beautiful, effective queries.
The SQL order of execution defines the order in which the clauses of a query are evaluated. Some of the most common query challenges I run into could be easily avoided with a clearer understanding of the SQL order of execution, sometimes called the order of operations. Understanding query order can help you diagnose why a query won’t run, and even more frequently will help you optimize your queries to run faster.
In the modern world, SQL query planners can do all sorts of tricks to make queries run more efficiently, but they must always reach the same final answer as a query that is executed per the standard order of execution. This order is:
SQL’s from clause selects and joins your tables and is the first executed part of a query. This means that in queries with joins, the join is the first thing to happen.
It’s a good practice to limit or pre-aggregate tables before potentially large joins, which can otherwise be very memory intensive. Many modern SQL planners use logic and different types of joins to help optimize for different queries, which can be helpful but shouldn’t be relied on.
In an instance like below, the SQL planner may know to pre-filter pings. That technically violates the correct order, but will return the correct result.
select count(*) from pings join signups on pings.cookie = signups.cookie where pings.url ilike '%/blog%'
However, if you are going to use columns in a way that prevents pre-filtering, the database will have to sort and join both full tables. For example, the following query requires a column from each table and will be forced into a join before any filtering takes place.
-- || is used for concatenation select count(*) from first_names join last_names on first_names.id = last_names.id where first_names.name || last_names.name ilike '%a%'
To speed up the query, you can pre-filter names with “a” in them:
with limited_first_names as ( select * from first_names where name ilike '%a%' ) , limited_last_names as ( select * from last_names where name ilike '%a%' ) select count(*) from limited_first_names join limited_last_names on limited_last_names.id = limited_first_names.id
To learn more, you can also read about how we sped up our own queries by 50x using pre-aggregation.
The where clause is used to limit the now-joined data by the values in your table’s columns. This can be used with any data type, including numbers, strings, or dates.
where nmbr > 5; where strng = 'Skywalker'; where dte = '2017-01-01';
One frequent “gotcha” in SQL is trying to use a where statement to filter aggregations, which will violate the rules of SQL’s order of execution. This is because when the where statement is being evaluated, the group by statement has yet to be executed and aggregate values are unknown. Thus, the following query will fail:
select country , sum(area) from countries where sum(area) > 1000 group by 1
But it can be solved using the having clause, explained below.
GROUP BY Clause
Group by collapses fields of the result set into their distinct values. This clause is used with aggregations such as sum() or count() to show one value per grouped field or combination of fields.
When using group by: Group by X means put all those with the same value for X in the same row. Group by X, Y put all those with the same values for both X and Y in the same row.
The group by clause is worthy of its own post for many reasons, and you can find a lot more information about “group by” in other posts on our blog or in our whitepaper about SQL tips and tricks: