We live in a world of data: there’s more of it than ever before, in a ceaselessly expanding array of forms and locations. Dealing with Data is your window into the ways Data Teams are tackling the challenges of this new world to help their companies and their customers thrive.

Data warehouses have changed the way the world deals with data. They are already deployed at countless companies across every industry and the market for these storage and computing solutions could top out at over $30 billion by 2025. This is all a far cry from the time when storing information was the biggest challenge an organization had to overcome. Organizations did endless analysis around what machines to purchase and how to best maintain them, then struggled with hiring professionals to handle their installation and upkeep. 

Today, companies can buy virtually limitless storage and computing power, without bothering with the server machines themselves. Modern cloud data engineers will never see the machines where their data and sometimes their entire code lives; their challenges now center around connecting data in the cloud with other systems to generate insights from those massive stores of data. Those insights inform business decisions, so data teams need to ensure they get into the hands of colleagues and end-user customers. Thankfully there are a variety of capabilities to increase the effectiveness and efficiency of such work.

Materialized views are one way to make gleaning insights and performing advanced analyses simpler, faster, and often cheaper, in an era where cloud-as-a-service is the dominant model. This article only scratches the surface of the ways materialized views of all kinds, from SQL server materialized views and Postgres materialized views to a wide array of other systems are changing analytics and data.

packages-CTA-banners_Cloud-Data-Teams

What is a materialized view, anyway?

In simplest terms, a materialized view can be thought of as the results of a query saved as a table. The name “materialized view” can be a bit confusing: The difference between a view and a materialized view is that, in a database or data warehouse, a view refers to a virtualized table representing the results of a query. A materialized view differs in that it is stored as a physical table in the database (taking up disk space in the server), where the information underlying the query can be updated as needed. 

This concreteness also helps simplify more complex data prep by taking the results of a query you might run multiple times (at the visualization layer) and keeping it at the ready, instead of creating it anew every time you need it. Materialized views not only improve BI performance but also allow you to consolidate business logic and maintain a source of truth.

Why use a materialized view?

In many data warehouses, users are charged per query: either through the volume of computing resources consumed which takes away from other queries, compute time, or storage scanned. If you have a huge dataset, you don’t want to run an oft-used query (one that’s referred to by multiple other charts and dashboards) over and over again. It’s much more efficient and cost-effective to prepare and aggregate the data once, through a materialized view, and query against that dataset. We tend to consider materialized views another tool at our disposal for performance boosts, with improvements akin to those we might see with setting primary keys or indexes.

For instance, since the materialized view lives in your data warehouse, data engineers and other cloud data experts can use SQL to clean and transform the data in-warehouse, instead of transforming it as it’s being presented in a visual layer or ingested by other analytic apps. 

Beyond analytics, data scientists are often the benefactors of materialized views when building advanced statistical models to answer predictive questions, using this cleaned, prepared data for their models. Typically we see these teams use their more native languages like Python and R to clean the data, but with that being handled in a data warehouse, they can pull directly from the materialized views. These advanced languages are an organization’s gateway to machine learning applications like sentiment analysis on text (consumer reviews, social media, etc.) and other more complex tasks. With all the data prep already done in the data warehouse, there’s no need to clean the data, and these tools can be utilized for what they do best.

How to create a materialized view

Creating a materialized view varies from system to system, but usually involves some version of the “CREATE” command, often “CREATE MATERIALIZED VIEW.” Simple, right? From there, just insert the underlying query and pull in your results. The system will create and store the database object, refreshing the data on the schedule you dictate.

But what does that look like? While different databases have different syntax for SQL, here is some sample code you can use to get you started with building your own materialized view from your SalesForce data.

Building your SQL server materialized view

To build a materialized view in SQL, start with this code.

CREATE MATERIALIZED VIEW sales_reporting.account_roll_up AS 
select
  account.id
  , account.name
  , account.monthly_revenue * 12 as annual_revenue
  , count(opportunity.id) as total_opps
  , sum(opportunity.amount) as total_opp_revenue
  , avg(opportunity.amount) as average_opp_revenue
from
  account
  join opportunity on
    account.id = opportunity.account_id
group by
  account.id
  , account.name
  , account.mrr
[;]

Experiment to win with materialized views

Organizations of all kinds are dealing with larger and larger datasets. Materialized views are a way to improve efficiency and cost-effectiveness when dealing with immense amounts of data and can also be useful for advanced analytics use cases. Whatever data warehouse you’re using and whatever you’re using it for, chances are there’s a place you can use a materialized view. Try it for yourself and see the improvements they offer.

packages-CTA-banners_Cloud-Data-Teams

Chris Meier is a Manager of Analytics Engineering for Sisense and boasts 8 years in the data and analytics field, having worked at Ernst & Young and Soldsie. He’s passionate about building modern data stacks that unlock transformational insights for businesses.

Tags: | | | | |