Data Preparation: Checking Under the Hood of Analytics Software

Say you want to buy a car. You go to a dealership and the salesman points out an attractive looking…

Say you want to buy a car. You go to a dealership and the salesman points out an attractive looking vehicle. “This is a great car”, he says, “Just look at the finishing. The amazing leather seats. The shiny new layer of red coloring.” He’s very enthusiastic so you hum politely and say, “That’s nice. What about the mileage? Gas consumption? How will this car actually get me to where I need to go?” The salesman waves your questions off and suavely responds: “Forget about all that stuff. It’s way too technical for you. Just look at these beautiful tinted windows.” Sounds a bit suspicious, doesn’t it?

Unfortunately, when it comes to Business Intelligence software, this type of shoddy salesmanship is quite common. Vendors often focus on showcasing their front-end capabilities, i.e., beautiful dashboard reporting and data visualizations, while completely ignoring the arguably more crucial aspect of analytics, namely data preparation: structuring, integrating and cleaning data to make it ready for analysis.

data prepation with analytics software

This process is what takes place ‘under the hood’ of business intelligence software and is related to the engine that powers it — and just like in a car, it’s what drives the software forward and determines its actual performance.

This article will walk you through some of the basic issues you need to address regarding data preparation when evaluating business intelligence software. To learn more about evaluating BI software, check out our free webinar on Data Visualization software.

When do you need to prepare data?

”Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets.” (The New York Times)

If you’re working with very small and simple datasets, e.g. a handful of similarly structured Excel spreadsheets, data preparation will not be much of an issue, if at all, since the different pieces of data are already stored in a similar format. However if you’re evaluating BI software, it’s safe to assume you have more data than that.

The typical scenarios in which you will need to devote serious resources to preparing data include:

  • Using more than one type of data source, e.g. Excel and data from SaaS applications
  • Working with large datasets
  • Working with messy, unorganized data

If you find yourself either forced to summarize data before analyzing it because otherwise it’s just too big for your computer to handle, or involving your IT or technical departments whenever you need new information — it’s likely you’ve entered the data preparation nightmare.

This is where your business intelligence tools come in. These tools are meant to automate or at least greatly simplify the bulk of the data preparation process by using pre-programmed adapters that connect into different types of data sources, and restructuring the data into a single centralized repository.

Tips to Navigating the Evaluation Process

Most if not all BI softwares come with some built-in data integration capabilities. However, not all software is created equal, which is why you should always remember to check ‘under the hood’.

Here are 3 crucial aspects of data preparation you should be aware of when evaluating business intelligence software:

1. Access to the Original Data

When working with large or complex datasets, some software solutions will opt to pre-aggregate parts of the data before enabling end users to analyze it. In other words, some of the data is lost in the process, and end users perform calculations on a summarized version (or view) of it. In other words, these programs are typically not built to handle complex data and therefore rely on artificially shrinking and simplifying it.

While this type of solution could suffice when you’re only looking to build a dashboard that displays high level trends, it can be problematic if you are interested in data discovery and high-res exploration. In these cases you should look for tools that can connect to the raw data itself and display the data at its full granularity, while still allowing access to an integrated view of the data on which to perform analysis.

2. Joining Multiple Data Sources

An important part of data preparation usually involves joining several data sources to create a single version of the truth (you can read more in my article on mashing up data sources). The question here is if you are working with a lot of disparate sources. In these cases, software that is capable of joining different data sources could come in handy, particularly if you want to perform in-depth analysis rather than merely see the data side by side.

Let’s say part of your data is stored in a NoSQL database such as MongoDB, and the other collected by cloud applications such as Zendesk: if you’re looking to reach original insights from this data, you’ll need to be able to cross-reference the different datasets to discover relationships between them.

3. Data Management, Structuring and ETL

Before data can be analyzed it needs to be: extracted from where it is originally stored, cleansed and transformed into a useable format, then loaded to a new destination after having been structured in such way that will allow the software to process it efficiently. This is relevant both for creating an initial centralized repository of data as well as when new data needs to be brought into the mix.

During the evaluation process, you’ll want to find out whether the software you’re looking at will be able to handle ETL and data management, considering the complexity of your data. If not, you should consider the additional costs this incurs. You might need to invest in solutions for data warehousing, become overly dependant on your IT department, or be forced to repeat the same process of semi-manually cleansing and transforming data every time you want to add new sources or update the existing database.

So Remember: Check the Engine

Returning to the car metaphor: testing its engine isn’t as easy as examining the car’s paint job, but needless to say one is much more important than the other. Similarly in Business Intelligence software, it’s important not be dazzled by beautiful dashboards — while these elements are important and useful, they are relatively simple to build and will be available in almost any of the major platforms.

However, it is the software’s back-end, i.e. the engine it uses to cleanse, integrate and manage the data, that often determines its actual value to an organization. Choosing the wrong software could skew your initial price estimate when you are forced to allocate technical resources or purchase additional programs to handle data preparation. In other cases you might get a business intelligence platform that will not enable you to perform actual analysis, but merely provide an improved graphical interface for the type of reporting you already had in the first place.

Learn More

Check out these step-by-step guides to data preparation for analytics, or learn more about different types of Business Intelligence software, watch our free webinar: Data Visualization is only Half of What You Need, or read this article on the 4 Steps to Successfully Evaluating Business Analytics Software.

Tags: |