Sometimes it’s helpful to look at an aggregated overview of many rows. With numeric columns, it’s easy to sum or average many values, but for string columns we need something different. We can concatenate strings from multiple rows with concatenating aggregations.
Imagine we have a table of salespeople and the customers they manage:
We can compact this and make it more human-readable by flattening the many rows per salesperson into a single row:
This can be done with any concatenating aggregation. There’s group_concat in MySQL, string_agg in Postgres, and listagg in redshift.
Let’s first look at how to use each of these functions. Then, we’ll cover a few tricks with concatenating aggregations.
Redshift recently added listagg, and a corresponding window function version. The basic usage is:
select salesperson, listagg(customer, ', ') within group (order by contract_size desc) from customers group by 1
The within group clause is an uncommon bit of syntax, but all it does is allow for ordering the rows used in an aggregation. Here, it orders all of the customers by contract size, then listagg concatenates their names with a comma separator.
select salesperson, string_agg(customer , ', ' order by contract_size desc) from customers group by 1
Postgres has a robust set of aggregation functions, and you can similarly aggregate multiple rows with array_agg and json_agg.
By combining array_to_string with array_agg, you can duplicate the behavior of string_agg.
select salesperson, group_concat(customer order by contract_size desc separator ',') from customers group by 1
MySQL supports an optional distinct operator in group_concat:
select group_concat(distinct customer) from customers
MySQL has a relatively low cap of 1024 characters of the length of a group_concat result, but it can be increased by changing the group_concat_max_len system variable.
SQL Server workarounds
There’s no concatenating aggregation in SQL Server, but there are workarounds based on XML path.
While the customer names are nice, it would be better if we could include contract size in our collapsed table:
For the remaining examples, we’ll use Redshift’s dialect.
Because these concatenating aggregations concatenate expressions, and not just column names, we can easily accomplish this by pre-concatenating the customer name and contract size:
select salesperson, listagg(customer || ' $' || contract_size, ', ') within group (order by contract_size desc) from customers group by 1
Concatenating a limited number of values
In a real database, concatenating all the rows for a given group will often create a string of unwieldy length. If we use a subquery to limit the number of rows in a given group first, then our final concatenation will be more readable:
with numbered_customers as ( select *, row_number() over (partition by salesperson order by contract_size desc) from customers ) select salesperson, listagg(customer || ' $' || contract_size, ', ') within group (order by contract_size desc) from numbered_customers where row_number <= 3 group by 1
We first declare numbered_customers, which numbers every customer per salesperson, starting with the largest contracts. Then we concatenate only the first three rows for each salesperson. Nice and tidy.
We hope you learned a few tricks for formatting SQL tables for human reading.