There is no greater impediment to the advancement of knowledge than the ambiguity of words, Thomas Ried once said. That rings very true when it comes to the way companies today are throwing around buzzwords like “big data” or “easy to use BI,” and only loosely defining the words, if at all, leaving everyone unsure of their meaning. Researching and choosing the best business intelligence solution for your company is challenging enough, so here are some definitions to help you sort through the marketing hype over buzzwords and really define the technical terms, their nuances, and what to look for in a BI tool.
Definition: When storing, handling, and reading data starts becoming complicated due to data size.
Problem: It’s completely relative, rendering the phrase ambiguous.
Solution: Determine your own definition of “Big Data” by getting a POC using your data.
Big data is relative since people experience different data limitations depending on their hardware and software, how many users are running queries, if multiple queries are being run concurrently, and so on. A warning sign that companies have big data is when they must start to split up data and manage it in different locations.
For example, this happens when companies rely on Excel for data management and start to have more rows of data than an Excel sheet scan can hold. The problem of big data becomes apparent once they need to create one report that takes data from multiple Excel sheets, which is now complicated since the data is scattered in different locations. Here are two common data limitations that define big data very differently:
- Disk storage measures big data in Terabytes: Since a commodity disk stores a few terabytes of data, when a company starts running out of storage space, then big data can be defined as more than a few terabytes.
- Memory (RAM) storage measures big data in Gigabytes: Commodity hardware typically allows 64-128GB of memory, which means when your data no longer fits in RAM (more than 128 GB) you have a big data querying problem. Since RAM is consumed by data as well as intermediate calculations for using queries, the more queries and/or users you have accessing the data, the more RAM is being used–and it can put your company over the top even if the actual data isn’t big. In fact, some BI software companies recommend that prospective buyers factor in an additional 10-20% RAM for each user that plans on using the software, and that’s on top of the RAM needed to store the data. Remembers, the more users and queries, the more RAM you need.
The solution is the same for both types of big data problems: you need a data management tool with ETL capabilities to trim the data down to size or split up into consumable pieces. Today, checking how much data you can query on a particular BI tool shouldn’t cost you any money–simply ask for a POC on your own data and see for yourself if you have big data. If you do have big data, make sure you calculate correctly and plan for the future by choosing a tool that can comfortably scale.
Definition: A pretty, big picture view of all your data*
Problem: Beautiful graphs don’t equal meaningful insights. You will only see a complete view of your data *If the technology behind it is powerful enough to take all your data into account.
Solution: Make sure a BI solution has an equally impressive back-end technology.
Data visualization can be like putting lipstick on a pig–beautiful reports and graphs doesn’t necessarily mean accurate and meaningful insights. Good-looking dashboards with sleek data visualizations can often sway people into thinking that a BI tool is a solid one. The function of data visualization is to help people easily understand all their data in one glance–to highlight insights that were impossible to see by just looking at a long list of values and numbers. But it’s important to realize that data visualizations can only represent data that the back-end technology is able to handle and deliver. If the data isn’t there, then all the glitzy data visualization in the world won’t provide meaning.
Think of your BI tool as a keyhole view into your data. If the keyhole is large enough to show you the full picture, meaning the technology is able to take all your data into account, then the data visualizations that are reflected will also show you the full picture. If the keyhole is small, as only a fraction of your data can be taken into account at a time, then the data visualizations may paint a pretty picture, but are entirely meaningless if you are trying to grasp a new insight.
If you want to see data visualizations that represent all you data, then you need to choose a BI tool that has a strong enough back-end technology to handle big data, data in multiple places, or both.
“Easy to Use”
Definition: ‘Easy to use’ rarely refers to a definable quality as the word is used abstractly.
Problem: What’s easy to use for an IT admin or a developer is not easy to use for a business user.
Solution: Take advantage of the free trial prospective BI tools offer by having business users try the BI tool to see on what terms it is “easy to use”.
“Easy-to-manage, multiprocessing cluster computers”, states Rocketclac. Does managing multiprocessing clusters sound easy to you? Every time you read marketing material that describes its BI tool as “easy to use”, you should ask: Easy for who? Easy to do what? If you’re reading good marketing material, you should be able to answer those questions by reading further in the context.
More importantly, if you’re looking at a solid software, you should also be able to test the tool with a free trial so you and your team can actually experience what it is to manage your data in this particular solution. For example, it’s telling to see if its easy for a business user to consolidate the data without dependending on IT. Only after a hand-on trial can you and your team truly determine if a BI tool is easy to use and for what aspects. Remember, what’s easy for a data scientist or IT admin to do is not easy for a non-technical, business user.
Definition: Part analyst, part artist; both a technical genius and business guru. A unicorn?
Problem: A data scientist is not one superhuman–it’s a team of people.
Solution: Instead of hiring an entire team of highly technical people, choose a BI solution that is full-stack (back-end and front-end) to simplify technical challenges such as the configuration and maintenance of data warehouse, ETL process, etc.
The term data scientist is thrown around in the tech industry and refers to a jack-of-all-trades with the technical knowledge in an array of disciplines such as analytics, computer science, modeling and statistics, as well as savvy business smarts. By looking to hire one brilliant data scientist, or worse, trying to morph your top IT guy into the data scientist, you’ll create a detrimental bottleneck. By trying to build a team of data scientists, especially on a budget, you’re looking at a long waiting period to find the right people and unavoidable big expenses.
Instead, focus on choosing a BI tool that has solid technology as its foundation that allows business users without technical skills to perform BI tasks such as: accessing data, joining multiple data sources, and building dashboard reports and dashboards without involving the technical people such as a data scientist, IT admin, developer, etc.