You can’t talk about data analytics without talking about data modeling. These two functions are nearly inseparable as we move further into a world of analytics that blends sources of varying volume, variety, veracity, and velocity. The reasons for this are simple: Before you can start analyzing data, huge datasets like data lakes must be modeled or transformed to be usable.

According to a recent survey conducted by IDC, 43% of respondents were drawing intelligence from 10 to 30 data sources in 2020, with a jump to 64% in 2021! With that much data flowing into analytics systems, the right data model is vital to helping your users derive actionable intelligence from them.

In this article, we’ll dig into what data modeling is, provide some best practices for setting up your data model, and walk through a handy way of thinking about data modeling that you can use when building your own.

Building the right data model is an important part of your data strategy.

Discover why

What is data modeling?

Data modeling is a sprawling topic but, at its core, it is the function that takes data in one structure and outputs it in another structure. The output structure is perhaps the most interesting and ultimately should be the key driver for how we model our data. In order to start modeling (or building a plan for your model), start by considering how you intend to use your data.

When it comes to data modeling, function determines form. Let’s say you want to subject a dataset to some form of anomaly detection; your model might take the form of a singular event stream that can be read by an anomaly detection service. Maybe one of the most common applications of a data model is for internal analysis and reporting through a BI tool. In these cases, we typically see raw data restructured into facts and dimensions that follow Kimball Modeling practices. Ultimately, the people or services consuming your data need to drive the final structure that your data model will take.

Working the (data modeling) process

After you decide on the final application for your data model, the next important consideration is the architecture of your analytics and BI tool. 

We’re going to nerd out for a minute and dig into the evolving architecture of Sisense to illustrate some elements of the data modeling process: Historically, the data modeling process that Sisense recommended was to structure data mainly to support the BI and analytics capabilities/users. Companies mainly wanted to dig into their own datasets, pull out actionable intelligence, and make smarter decisions to evolve their businesses.

But this was only the tip of the analytics iceberg. Data teams dealing with larger, faster-moving cloud datasets needed more robust tools to perform deeper analyses and set the stage for next-level applications like machine learning and natural language processing. To support these advanced use cases, data was structured in Cloud Data Teams by Sisense (formerly Periscope Data) to simplify SQL queries on dashboards by creating varying levels of granularities in tables visible to the tool. 

In determining how we wanted to merge these data stacks and how we wanted to model our data, we also revisited our data strategy to consider what we as a data team were striving to achieve within the business. Harvard Business Review presents a range of data strategies from defensive to offensive: The former is more closely aligned to ensuring data is accurate for business reporting through the creation of a single source of truth, while the latter is focused more on using data for “supporting business objectives such as increasing revenue, profitability, and customer satisfaction.” Both of these concepts resonated with our team and our objectives, and so we found ourselves supporting both to some extent.

Understanding data modeling architecture

So, we’ve determined that the final use of a data model influences its form, as does the architecture of the system in which you’re building the data model. So far, so good.

With our strategy in mind, we factored in our consumers and consuming services, which primarily are Sisense Fusion Analytics and Cloud Data Teams. The new architecture requires that data be structured in a dimensional model to optimize for BI capabilities, but it also allows for ad hoc analytics with the flexibility to query clean and raw data. Interestingly, this ad hoc analysis benefits from a single source of truth that is easy to query to allow for quickly querying of raw data alongside the cleanest data (i.e., let’s look at the raw product feature logs for customers in X industry paying us Y dollars). The upshot is that users can create models with a combination of relatively raw data, data cleaned consistently for a single source of truth, and a set of facts and dimensions to allow for fast and flexible analytics.

The right model for your system

Whatever system you use for analytics and BI, you need a data flow that works for you and for your platform. Your data flow should clean your data, make it BI ready, and then generate a single source of truth for reporting. (This design philosophy was adapted from our friends at Fishtown Analytics.)

Here at Sisense, we think about this flow in five linear layers:

Data modeling

Raw
This is our data in its raw form within a data warehouse. We follow an ELT (Extract, Load, Transform) practice, as opposed to ETL, in which we opt to transform the data in the warehouse in the stages that follow.

View
These aren’t really “transformations”; instead they are logical representations of the tables in the Raw database. We use this layer to protect our data modeling pipelines from schema changes (i.e., data type changes, altered columns, or perhaps a need to change the source) that could render the raw data incorrect or unusable.

Staging
A staging table output is the result of a similar set of transformations on views or other staging tables. Each step of the above is broken out into a separate transformation to optimize for re-usability, performance, and readability. Ideally these are tables with as few columns and data sources referenced as possible, accomplishing one of these steps:

A staging table output is the result of a similar set of transformations on views or other staging tables. Each step of the above is broken out into a separate transformation to optimize for re-usability, performance, and readability. Ideally these are tables with as few columns and data sources referenced as possible, accomplishing one of these steps:

  1. Cleaning (e.g., filling in nulls, changing time zones, formatting strings, conditional logic, etc.)
  2. Enriching (e.g., categorizing, organizing, joining in supplemental attributes)
  3. Mapping (e.g., building connections via business logic between two data sources)
  4. Merging (e.g., putting two similar datasets into a single table, such as unioning)
  5. (De)Normalizing (e.g., transposing, or un-nesting datasets)
  6. Aggregating (e.g., reducing granularity, roll-ups )

BI
The BI tables are our fact and dimensional tables built off the transformed tables in the staging layer and are used to power our analytics within BI tools.

Reporting
Reporting contains the flattest and most cleaned version of our data. It often will collapse the metrics in a fact table to the level of a single dimension through a form of aggregation or lookback window. This provides a very flat and wide table to optimize for querying our data by providing lifetime, lookback, first day, last day, or current day metrics alongside dimensional attributes. These types of tables are sometimes called one big table or OBT, and for us serve as the pinnacle of our single source of truth.

Unlocking game-changing intelligence with data models

We want the power to analyze data quickly, and dimensional modeling of our data paired with BI and analytics functionalities in the right analytics platform provide just that. The atomic nature of the transformations that precede the BI layer plus our view and reporting layer allow for flexible ad hoc analytics through Sisense.

Importantly, both workflows for data analytics are supported by a set of data models that follow the same data pipeline. This decreases opportunities for metrics derived in different workflows to deviate from each other, and it also reduces the overall complexity and cost of the data modeling pipeline.

Whatever you are looking to do with your analytics platform, it’s important that you choose a tool that can connect to all your data sources and help you prepare data models that will support your needs. Building the right data model will help you get the most out of your data and uncover game-changing actionable intelligence that you can embed into workflows, present to users, and use to evolve your business.

The right data model + artificial intelligence = augmented analytics.

Dig into AI

Attempting to learn more about the role of big data (here taken to datasets of high volume, velocity, and variety) within business intelligence today, can sometimes create more confusion than it alleviates, as vital terms are used interchangeably instead of distinctly. However, when investigating big data from the perspective of computer science research, we happily discover much clearer use of this cluster of confusing concepts.

Before we dive into the topics of big data as a service and analytics applied to same, let’s quickly clarify data analytics using an oft-used application of analytics: Visualization!

Looking at the diagram, we see that Business Intelligence (BI) is a collection of analytical methods applied to big data to surface actionable intelligence by identifying patterns in voluminous data. As we move from right to left in the diagram, from big data to BI, we notice that unstructured data transforms into structured data. The implication is that methods of data analytics are applied to big data, the methods of data preparation and data mining for example, to bring us closer and closer to the goal of distilling useful patterns, knowledge, and intelligence that can drive actions in the right hands. 

Hopefully this clarifies these complex concepts and their place in the larger analytics process, even though it’s common to see pundits and outlets tout BI or big data as if they were ends in themselves.

AI-driven analytics is a complex field: The bottom line is that datasets of all kinds are rapidly growing, causing these organizations to investigate big data reporting tools or even approach companies whose whole business model can be summed up as “big data as a service” in order to make sense of them.  If you’ve got big data, the right analytics platform or third-party big data reporting tools will be vital to helping you derive actionable intelligence from it. And one of the best ways to implement those tools is to embed third party plugins.

Big data challenges and solutions

When you have big data, what you really want is to extract the real value of the intelligence contained within those possibly-zettabytes of would-be information. To best understand how to do this, let’s dig into the challenges of big data and look at a wave of emerging issues.

For starters, the rise of the Internet of Things (IoT) has created immense volumes of new data to be analyzed. IoT sensors on factory floors are constantly streaming data into cloud warehouses and other storage locations. 

These rapidly growing datasets present a huge opportunity for companies to glean insights like:

  • Machine diagnostics, failure forecasting, optimal maintenance, and automatic repair parts ordering. Intelligence derived from these systems can even be fed to HR teams to improve service staffing, which further feeds to enterprise HR management and performance solutions (AI-based analytics reporting to ERP solutions)
  • Assembled products shipped also feed directly to ERP on updating supply chain solutions, improving customer awareness and experience

To put it bluntly, the challenge we face is that no cloud architecture yet exists which can accommodate and process this big data tsunami. How can we make sense of the data wont fit in the enterprise service bus (ESB)? (ESB is a middleware component of cloud systems which will be overwhelmed if a million factories were to all try to extract intelligence from their sensors all at once.)

One  solution with immense potential is ”edge computing.” Referring to the conceptual “edge” of the network, the basic idea is to perform machine learning (ML) analytics at the data source rather than sending the sensor data to a cloud app for processing. Edge computing analytics (like the kind platforms like Sisense can perform) generate actionable insights at the point of data creation (the IoT device/sensor) rather than collecting the data, sending it elsewhere for analysis, then transmitting surfaced intelligence into embedded analytics solutions (eg. displaying BI insights for human users).  

The pressure to adopt the edge computing paradigm increases with the number of sensors pouring out data. Edge computing solutions in conjunction with a robust business intelligence big data program (bolstered by an AI-empowered analytics platform) are a huge step forward for companies dealing with these immense amounts of fast-moving and remote data.

Big data analytics case study: SkullCandy

SkullCandy, a constant innovator in the headset and earbud space, leverages its big data stores of customer data regarding reviews and warranties to improve its products over time. In a twist on typical analytics, SkullCandy uses Sisense and other data utilities to dig through mountains of customer feedback, which is all text data. This is an improvement over previous processes, wherein SkullCandy focused on more straightforward performance forecasting with transactional analysis. 

Now that SkullCandy has established itself as a data driven company, they are experimenting with additional text analytics that can extract insights from reviews of their products on Amazon, BestBuy, and their own site. Teams also use text analytics to benchmark their performance against their competitors. 

SkullCandy’s big data journey began by building a data warehouse to aggregate their transaction data, reviews. A breakthrough insight/intelligence in product development occurred thanks to the text analysis of warranties through which SkullCandy was able to distinguish between product issues and customer education. The fact that AI-based analytics can delineate between product and  education in a text message is groundbreaking. A common pattern was that clients were returning a product as broken when in fact they simply didn’t know how to use bluetooth connectivity.

Data-driven product development also benefitted: Big data analytics allowed SkullCandy to analyze warranty/return data that showed that one of their headsets, which was used more during workouts than previously thought, was being returned at a higher than normal rate. It turned out that sweat was causing corrosion in terminals, leading to the returns. The outcome was to waterproof the product.  

Among the many successes SkullCandy achieved, we also see a pattern of value derived from big data.

Big Data as a Service: Empowering users, saving resources

Strictly speaking, “big data analytics” distinguishes itself as the large-scale analysis of fast-moving, complex data. Implicit in this distinction is that big data analytics ingests expansive datasets far beyond the volume of conventional databases, in essence combining advanced analytics with the contents of immense data warehouses or lakes.

In order to get a handle on these huge amounts of possible-information, the AI components of a big data analytics program must necessarily include procedures for inspecting, cleaning, preparing, and transforming data in order to create an optimal data model that will facilitate the discovery of actionable intelligence, identify patterns, suggesting next steps, and supporting decision making at key junctures.

Intelligence drawn from big data has real potential to transform the world, from text analysis that reveals customer service issues and product development potential to training financial models to detect fraud or medical systems to detect cancer cells. Savvy businesses will empower users, analysts, and data engineers to prepare and analyze terabyte-scale data from multiple sources — without any additional software, technology, or specialized staff.

Fortunately, it is now possible to leverage all of these potentials and to avoid the cost and time of in-house development, by embedding expert third party analytics. Recognizing the tremendous task of big data analytics in conjunction with the value of outcomes, the natural propensity exists to use it as a service, and thereby reap the benefits of big data as a service as quickly as possible.

The right data model + artificial intelligence = augmented analytics.

Dig into AI

Chris Meier is a Manager of Analytics Engineering for Sisense and boasts 8 years in the data and analytics field, having worked at Ernst & Young and Soldsie. He’s passionate about building modern data stacks that unlock transformational insights for businesses.

Tags: |