Tony Hoare invented the null reference in 1965, and he considers it his “billion-dollar mistake” for the amount of trouble it has caused. Even today, SQL’s null value is the cause of several common mistakes.
Let’s go over the most egregious.
These two queries return the exact same result on a users table with many rows:
select * from users where deleted_at = null; -- result: 0 rows select * from users where deleted_at != null; -- result: 0 rows
How can that be? It’s because null represents an “unknown” type. This means it doesn’t make sense to compare null to anything else with normal conditional operators. Null isn’t even equal to itself:
select null > 0; -- result: null select null < 0; -- result: null select null = 0; -- result: null select null = null; -- result: null
The right way to compare values with null is with the is, and is not operators:
select * from users
where deleted_at is null; -- result: all non-deleted users select * from users where deleted_at is not null; -- result: all deleted users
If you want to check if two columns are different, you can use is distinct from:
select * from users where has_address is distinct from has_photo -- result: users with an address or -- photo, but not both
Not in Null
One handy way to filter rows is with a subselect. For example, if you wanted the users who did not have any packages, you could write a query like this:
select * from users where id not in (select user_id from packages)
But if one of the rows in packages has a null user_id, this query will return no results! To understand why this happens we need to factor the query like the SQL compiler does. Here’s a simpler example:
select * from users where id not in (1, 2, null)
Which translates to:
select * from users where id != 1 and id != 2 and id != null
As we now know, id != null is an unknown value, null. Using and on any value with null becomes null, so all of the other and conditions fall away. No rows match the resulting query since null is not equal to true.
If the condition is inverted, the query works fine. This time we’ll look for users with packages.
select * from users where id in (select user_id from packages)
Which we can simplify for the example:
select * from users where id in (1, 2, null)
This query translates to:
select * from users where id = 1 or id = 2 or id = null
Since the where clause is a list of or conditions, it doesn’t matter that one of them is null. That condition is simply ignored because non-true values do not change the evaluation of the rest of the clause.
When it comes to sorting, nulls are considered the largest possible value. This can lead to some frustrating queries when trying to sort values descending, since all the nulls will be on top.
This query is meant to show the users ranked by their points, but it’s putting users without any points first!
select name, points from users order by 2 desc; -- null points sort above -- any number of points!
There are two ways to deal with this. The easiest way is to get rid of those nulls in the output or the comparison using coalesce:
-- treat nulls as 0 in output select name, coalesce(points, 0) from users order by 2 desc; -- keep nulls in output, but sort as 0 select name, points from users order by coalesce(points, 0) desc;
And if your database supports it, you can instead tell it where to put nulls when sorting with nulls first or nulls last:
select name, coalesce(points, 0) from users order by 2 desc nulls last;
Of course, nulls can also be used to prevent errors. One great use of nulls is in dealing with divide by zero errors.
Divide by Zero
Divide by zero errors are especially painful. Queries that ran fine yesterday all of a sudden are failing with divide by zero errors. One common solution is to check if the denominator is 0 before dividing with a case statement:
select case when num_users = 0 then 0 else total_sales/num_users end;
The case statement approach is verbose and duplicates the denominator. That’s OK if the denominator is simple, but if it’s an expression, you’re likely to get more bugs if you change the query later.
Here we can use null to our advantage. Use nullif on the denominator to make the denominator null instead of zero. Rather than failing, the query will return null on days where num_users = 0
select total_sales/nullif(num_users, 0);
If you prefer the result to be 0 or anything else instead of null, use coalesce on the previous query:
select coalesce(total_sales/nullif(num_users, 0), 0); -- nulls results become 0
Tony Hoare may regret his mistake, but at least it’s easy to work around the issues that null presents. Now go forth with your new knowledge and keep null from nullifying your future query results!