One of the most difficult marketing challenges today is attribution. Give too much credit to one marketing channel and you might mask its underperformance. Don’t give enough credit to another, and you could hide your most valuable channel.
Getting The Data Organized
Let’s start with an idealized web logs table that may be familiar to a lot of folks. We’ll have each request’s cookie, URL, referrer and timestamp. Here’s an example:
The first order of business is to attribute marketing channels to individual requests. We’ll do that with a simple case statement:
select cookie, case when url ilike '%utm_source=adwords%' then 'paid' when url ilike '%/blog%' then 'content' when referrer ilike '%facebook%' then 'social' end as channel, timestamp from web_logs
This effectively replaces the raw URL and referrer data with useful channel info:
Now, for each cookie, let’s get the channels in order for easy analysis later. We’ll use Postgres’s array_agg here. For other DBs, avail yourself of one of their alternatives.
select cookie, array_agg(channel order by timestamp) as channel_list from channel_web_logs group by 1
This gives us a nice ordered array of channels for each cookie:
Now let’s put some attribution strategies into practice!
Stack-Rank Attribution credits leads to channels in descending order of the channel’s importance. This is a helpful strategy on small teams when certain channels are much more important than others, and the team needs to make sure the most important channels are performing. Here’s an example traffic count query:
select case when 'paid' = any(channel_list) then 'paid' when 'content' = any(channel_list) then 'content' when 'social' = any(channel_list) then 'social' else 'uncategorized' end end as channel, count(1) from cookie_channels group by 1
Don’t be confused by the Postgres any function. It just checks whether the element is in the array. Here we assign a cookie to a channel in descending order of channel importance: “paid” if it’s ever come through an ad click, “content” if it’s ever viewed a blog post, “social” if it’s ever come from Facebook. Otherwise it’s “uncategorized”.
This is useful for making sure to fully attribute every paid dollar, almost fully attribute every hour spent on content, and not stress too much about other channels.
First-Touch is a very common attribution strategy. It’s useful for attributing leads more fairly, and operates under the philosophy that the user would not be here at all were it not for that first channel that brought them in. Here’s the SQL for visitors by channel, attributing via first touch:
select coalesce(channel_list, 'uncategorized'), count(1) from cookie_channels group by 1
This SQL is even simpler! We simply look at the first channel in our list of channels. If there’s no first channel, i.e. the channel_list is empty, we attribute “uncategorized.”
Last-Touch Attribution is also fairly common, and, just like it sounds, attributes cookies to the last marketing channel that they touched. This strategy is more common in freemium businesses, where free users may come from a variety of channels, but the business value is generated by the channel that converted them to paid. The SQL is very similar to our first-touch SQL:
select coalesce( channel_list[array_length(channel_list, 1)], 'uncategorized' ), count(1) from cookie_channels group by 1
Weighted Attribution seeks to evenly distribute the credit for each visitor to each channel that contributed. If a visitor has touched 4 marketing channels, then each channel should get credit for 0.25 visitors.
This strategy is common in larger marketing organizations, where it’s important to allocate resources fairly based on each channel’s performance. Here’s the SQL:
with cookie_weights as ( select cookie, 1.0 / count(1) as channel_weight from ( select cookie, unnest(channel_list) from cookie_channels ) ) select channel, sum(channel_weight) from cookie_channels join cookie_weights using (channel) group by 1
Let’s take this query step by step. In the inner select of the cookie_weights CTE, we use unnest to expand the channel array out to one row per channel. Then we assign a weight to each channel in inverse proportion to the number of channels. For example, if a cookie has 4 channels attributed, then its weight is 0.25.
Finally, we join those weights back to our original cookie_channels table and sum the weights per channel. This gives each channel its weighted sum of cookies brought into the product!