We live in a world of data: There’s more of it than ever before, in a ceaselessly expanding array of forms and locations. Dealing with Data is your window into the ways data teams are tackling the challenges of this new world to help their companies and their customers thrive.
The data industry has changed drastically over the last 10 years, with perhaps some of the biggest changes happening in the realm of data storage and processing.
The datasphere is expanding at an exponential rate, and companies of all sizes are sitting on immense data stores. And where does all this data live? The cloud.
Modern businesses are born on the cloud: Their systems are built with cloud-native architecture, and their data teams work with cloud data systems instead of on-premises servers.
The proliferation of cloud options has coincided with a lower bar to entry for younger companies, but businesses of all ages have seen the sense of storing their data online instead of on-premises.
The increased interest in cloud storage (and increased volume of data being stored) coincides with an increased demand for data processing engines that can handle more data than ever before.
The shift to the cloud has opened a lot of doors for teams to build bolder products and infuse insights of all kinds into their in-house workflows, user apps, and more.
The cloud is the future, but how did we get here?
Let’s dig into the history of the traditional data warehouse versus cloud data warehouses.
Data warehouse vs. databases
The boosted popularity of data warehouses has caused a misconception that they are wildly different from databases. While the architecture of traditional data warehouses and cloud data warehouses does differ, the ways in which data professionals interact with them (via SQL or SQL-like languages) is roughly the same.
The primary differentiator is the data workload they serve. Let’s explore:
online analytical processing (OLAP)
online transaction processing (OLTP)
|Write once, read many||Write many, read many|
|Best for large table scans||Best for short table scans|
|Typically a collection of many data sources||Usually one source that serves an application|
|Petabyte-level storage||Terabyte-level storage|
|Columnar-based storage||Row-based storage|
|Lower concurrency||Higher concurrency|
|Examples: Redshift, BigQuery, Snowflake||Examples: Postgres, MySQL|
Given that both data warehouses and databases can be queried with SQL, the skillset required to use a data warehouse versus a database is roughly the same. The decision as to which one to use then comes down to what problem you’re looking to solve.
If there’s a need for data storage and processing of transactional data that serves an application, then an OLTP database is great. However, if the goal is to perform complex analytics on large sets of data from disparate sources, a warehouse is the better solution.
Before we look at modern data warehouses, it’s important to understand where data warehouses started to see why cloud data warehouses solve many analytics challenges.
Traditional vs. Cloud Explained
Traditional data warehouses
Before the rush to move infrastructure to the cloud, data being captured and stored by businesses was already increasing, and thus there was a need for an alternative to OLTP databases that could process large volumes of data more efficiently. The business began to build what are now seen as traditional data warehouses.
A traditional data warehouse is typically a multi-tiered series of servers, data stores, and applications.
While the organization of these layers has been refined over the years, the interoperability of the technologies, the myriad software, and orchestration of the systems make the management of these systems a challenge.
Further, these traditional data warehouses are typically on-premises solutions, which makes updating and managing their technology an additional layer of support overhead.
Cloud data warehouses
The traditional data warehouses solved the problem of processing and synthesizing large data volumes, but they presented new challenges for the analytics process.
Cloud data warehouses took the benefits of the cloud and applied them to data warehouses — bringing massive parallel processing to data teams of all sizes.
Software updates, hardware, and availability are all managed by a third-party cloud provider.
Scaling the warehouse as business analytics needs grow is as simple as clicking a few buttons (and in some cases, it is even automatic).
The warehouse being hosted in the cloud makes it more accessible, and with a rise in cloud SaaS products, integrating a company’s myriad cloud apps (Salesforce, Marketo, etc.) with a cloud data warehouse is simple.
The reduced overhead and cost of ownership with cloud data warehouses often makes them much cheaper than traditional warehouses.
Cloud data warehouses in your data stack
We know what data warehouses do, but with so many applications that have their own databases and reporting, where does the warehouse fit inside your data stack?
To answer this question, it’s important to consider what a cloud data warehouse does best: efficiently store and analyze large volumes of data. The cloud data warehouse does not replace your OLTP database, but instead serves as a repository in which you can load and store data from your databases and cloud SaaS tools.
With all of your data in one place, the warehouse acts as an efficient query engine for cleaning the data, aggregating it, and reporting it — often quickly querying your entire dataset with ease for ad hoc analytics needs.
In recent years, there has been a rise in the use of data lakes, and cloud data warehouses are positioning themselves to be paired well with these. Data lakes are essentially sets of structured and unstructured data living in flat files in some kind of data storage. Cloud data warehouses have the ability to connect directly to lakes, making it easy to pair the two data strategies.
A data-driven future powered by cloud data warehouse technologies
The three most popular cloud data warehouse technologies are Amazon’s Redshift, Snowflake, and Google’s BigQuery. They each handle the same workloads relatively well but differ in how computing and storage are architected within the warehouse.
While they’re all great options, the right choice will be based on the scaling needs and data type requirements of the business. Beyond that, the pricing structure for the three varies slightly, and based on the use case, certain warehouses can be more affordable than others.
As the number of cloud data warehouse options on the market grows, niche players will rise and fall in every industry, with companies choosing this or that cloud option based on its ability to handle their data uniquely well.
Whatever your company does and wherever you’re trying to infuse insights, be it into workflows or customer-facing apps, there’ll be a cloud option that works for you.
The future is in the clouds, and companies that understand this and look for ways to put their data in the right hands at the right time will succeed in amazing ways.
Adam Luba is an Analytics Engineer at Sisense who boasts almost five years in the data and analytics space. He’s passionate about empowering data-driven business decisions and loves working with data across its full life cycle.