<h2 class="wp-block-heading"><strong>Event Tables</strong></h2>



<p>Single denormalized events tables are increasingly common, especially for storing logs data from mobile clients. These single-table implementations often use <a href="https://www.periscopedata.com/blog/the-lazy-analysts-guide-to-postgres-json.html">JSON blobs</a> to store properties.</p>



<p>A single table is great for quickly ingesting data. Unfortunately, it can also make real-time queries slow and cumbersome.</p>



<h2 class="wp-block-heading"><strong>User-Based Analysis on Denormalized Tables</strong></h2>



<p>A lot of data analysis involves aggregating data by user and reasoning about those users. With a denormalized table, you have to extract these user data point separately for each query, which wastes CPU time and analyst time.</p>



<p>For example, calculating paid user retention requires finding all the users who have paid. Then you join that back to your activity table to see those users’ behaviors changing over time.</p>



<p>Since this is a core metric for your business, you’ll be running a query like this daily, if not hourly:</p>



<pre class="wp-block-code"><code>with retention_users as (
  select 
    user_id,
    date(min(created_at)) as first_login,
  from events
  group by 1
  having min(created_at) > now() - interval '14 day'
),
user_activity as (
  select
    user_id,
    date(created_at) as day
  from events
  group by 1, 2
  where created_at > now() - interval '7 day'
)
select 
  day, 
  count(retention_users.user_id) / count(user_activity.user_id)
from user_activity
right join retention_users using (user_id)
where 
  retention_users.first_login + interval '7 day' 
    > user_activity.day 
group by 1</code></pre>



<h2 class="wp-block-heading"><strong>The Solution</strong></h2>



<p>Luckily, there’s a solution: Normalizing metadata from events tables. This can make analysis queries much faster.</p>



<p>A user&#8217;s table is a great place to start. You’ll want to store some metadata. The details depend on the analyses you’ll be doing, but common cohorts include total spend, platform, marketing channel, join date, and experiment groups.</p>



<p>Here’s an example:</p>



<figure class="wp-block-image fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/image-1-single-blog.png" alt="Users table" class="wp-image-72856"/></figure>



<p>With such a table, your retention query simplifies to something like this:</p>



<pre class="wp-block-code"><code>with user_activity as (
  select
    user_id,
    date(created_at) as day
  from events
  group by 1, 2
  where created_at > now() - interval '7 day'
)
select 
  day, 
  count(users.user_id) / count(user_activity.user_id)
from user_activity
right join users using (user_id)
where users.first_login + interval '7 day' > user_activity.day 
  and users.age between 18 and 35
group by day, users.spend_level</code></pre>



<p>As a bonus, this version will run quite a bit faster, as you’re not joining the whole events table to itself!</p>



<h2 class="wp-block-heading"><strong>Creating and Updating Your Tables</strong></h2>



<p>Depending on your stack and preferences, you have lots of options for creating and updating these tables.</p>



<h4 class="wp-block-heading"><strong>Views</strong></h4>



<p>Sisense for Cloud Data Teams&#8217; Views feature will let you materialize a table with a simple select statement. You write the statement once, and Sisense for Cloud Data Teams updates the view every hour. This is an especially good option if you don’t own the database because your events are stored by a third-party service like <a href="https://amplitude.com/" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">Amplitude</a> or <a href="https://segment.com/" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">Segment</a>.</p>



<p>If you do own the database, certain databases support <a href="https://www.postgresql.org/docs/9.4/rules-materializedviews.html" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">materialized views</a>, which are a good option.</p>



<h4 class="wp-block-heading"><strong>Frequent ETLs</strong></h4>



<p>If you’re already ETLing data into the database, this can provide a natural home for code to create a table. You can also set it up to trigger the table creation when new events are added, keeping the metadata tables perfectly up-to-date.</p>



<h4 class="wp-block-heading"><strong>Database Triggers</strong></h4>



<p>If you want the metadata tables to live in your database, but don’t own your ETL, a <a href="http://en.wikipedia.org/wiki/Database_trigger" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">database trigger</a> works well. The trigger will run when new events are inserted and update your metadata tables as well. Remember to <a href="https://stackoverflow.com/questions/460316/are-database-triggers-evil" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">keep them fast</a>, or they’ll bog down your insert statements.</p>



<h3 class="wp-block-heading"><strong>Wrapping Up</strong></h3>



<p>Denormalized logs tables can be convenient when you’re logging the data. Just remember to set up normalized metadata tables on the other end!</p>


Single Event Tables and Common Analysis Queries

LinkedIn

Twitter

GitHub

curve-image-unique-image-unique

curve

3-dark-2-image-unique-image-unique

3 DARK 2

Get the latest in analytics right in your inbox.

Article