3 Steps to Choosing the Right Data Analytics Tool
“We need to understand our data now!”
If they’re not literally screaming it, businesses across the world are feeling the sentiment as they scramble to harness the power of their data. The fear of being left behind by the data revolution is spurring them to action — and many are making hasty, ill-informed choices.
If you’re ready to harness the power of data analytics, know this: the tools you choose will have strategic consequences for your company. And wrong choices can have a significant impact. There are plenty of opportunities for this decision to go wrong.
Here are some real-life horror stories:
A single business unit uses multiple compartmentalized data tools — resulting in conflicting data and Business Intelligence conclusions
Due to confusion with data sources, business analysts’ queries use the wrong data — leading to disastrous decisions by C-Suite leaders
A company engages a data warehouse provider that advertises free, easy setup and a pay-per-query fee structure — but gets surprised by the unanticipated costs of each refresh, ad hoc query and data engineer keystroke
A data team chooses a suite of data tools for its predictable pricing, but can’t increase compute power without also buying more storage — which is both expensive and unnecessary
The bad news is that the complexity of options across the data tech stack can confuse even sophisticated data practitioners.
The good news? After performing our assessment, you’ll be prepared to avoid these outcomes. The world of data analysis tools may be scary now, but our buyers’ guide introduces the landscape and helps you arrive at a solution that will fit your business perfectly.
Step One: Understand Your Business
Understanding your business is the first and most important step. Don’t look for analytics tools until you can answer these essential questions.
What questions are you seeking to answer, and what do you expect to accomplish with the results?
Before you search for a data analysis solution, define your motives. What problems do you want to solve, and how are they important to your organization?
It might help to write out three to five explicit, exact, quantifiable questions to steer your exploration. Consider the nature of the queries that those questions require.
- Will they need constant updates to architecture, or will queries run a standard way?
- Do they pull on a mostly static data model or one that constantly changes?
Most organizations run queries of varying complexity — but understanding how much of your tactical and strategic needs will be fulfilled by more complicated, ad hoc analysis is crucial to finding the right tools.
Where is the data?
Is it in a single consolidated source or disparate sources? Data from multiple sources often requires blending into a single source of truth before it can be used in analysis — so choosing a warehouse solution becomes a necessary initial step. Think through query complexity as well as the amount and current location of your data, since some migrations will be more native than others.
Who handles the data?
Some important questions include:
- Will IT personnel, data engineers or analysts be involved before the consumer sees it?
- How much do you expect users of various types to collaborate?
- What data governance will you need to protect the data while empowering users who need it?
- How will you need to assign privileges and permissions to each person, position or level to influence the process?
- How many people will handle the data? (Anticipate your future growth too, since plans to scale significantly affect the best choice for your organization.)
- How do you expect your data needs to increase?
- How will your queries per day (which typically corresponds with number of users) rise?
Forgetting to account for future needs to scale can result in extra costs, timed-out queries or databases that crash for hours or days.
Ask smart questions about your users. Providers define users differently, and some use fee structures that stipulate categories of users. Some charge subscription fees by number of users; others allow unlimited users and invoice based on number of queries.
What are your “users” actually doing with the data? People who merely consume in read-only mode or consult a dashboard once a day may not need a named user license. Sisense for Cloud Data Teams, for example, can let users share links with dashboards that automatically update, so members can benefit passively from data analysis without engaging with it themselves or paying extra for that access.
Who is the consumer?
Identifying your consumers is an essential — and often overlooked — step in the process of selecting data analytics tools.
• Who are the stakeholders throughout your organization, and what are their needs and expectations for using data?
You need a holistic, companywide accounting of data analysis needs.
• How data-sophisticated are your consumers?
Are they data scientists and engineers who understand the underlying data models and can write SQL, R and Python code for complex ad hoc queries? Or are they BI users, who need intuitive UX to perform uncomplicated analyses with uniform datasets? The common case involves both types — and all shades of ability in the middle?
• What are their expectations for timeliness?
Data scientists who understand the details of complex queries commonly allow minutes for them to run. But they also possess the native-language skills to optimize processes when they need to. Less sophisticated data users may expect unrealistically quick results for their queries and grow impatient when screens take more than a few seconds to load. They also rely on drag-and-drop interfaces and layers, which cause slower processing times.
• Can educating these clients satisfy them, or are their expectations for speed rooted in true business requirements, rather than artificial, arbitrary timelines?
If their expectations for timeliness are relevant to business, you may need to invest in tools that can accelerate data refresh rates.
• What are their expectations for use and collaboration?
These days, most organizations recognize the power of a single source of truth, fed by disparate data stores and managed by a DBA. Nevertheless, some business units still use — and remain content with — siloed data, run through siloed data analysis processes. If you want to take advantage of cross-functional synergies, you’ll research a solution suite that can fulfill your business’s openness to collaboration.
What is the budget?
In data analysis tools, like anything else, you get what you pay for. It’s important to consider what you need, as well as what you can afford, since most solutions are customizable to some degree. Some important questions to ask include:
- Is your data analytics tools budget part of the BI budget, or the budget for general analytics?
- Does it include the costs of essential features like warehousing and pipeline, or are those costs counted elsewhere?
- How predictable do the costs need to be?
- Do you need to lock them in for the next three years, or can your company pay different amounts as requirements change?
Some tools lock in pricing and services, but prove less flexible if your needs change. Others feature variable pricing — but may expose you to unanticipated costs if you don’t pay attention to query counts or user numbers.
Step Two: Understand Your Options — and Tradeoffs
The options you’ll weigh come in four general categories: pipeline, storage, analysis, and sharing. Some of them address more complex needs for data experts, others meet simpler needs for BI questions. To keep costs down, balance your requirements. As you select solutions, assess your needs in each category, as well as their mutual interdependence.
When choosing a pipeline, you need to account for the configuration and consolidation of disparate sources of your data. In the end, though, the basic choice is between ETL or ELT — or some combination of the two.
ETL (Extract, Transform, Load) is the process that conducts its constituent operations in that order. First, it extracts data from the original sources, then it transforms that data into a clean, usable form. Finally, it loads the data into the target database or data warehouse.
ETL can be appropriate for complex databases — those that, for example, are fully customizable and deeply nested, with changing data models. MongoDB stores, Cassandra databases, and sources with JSON blobs tend to be good candidates for ETL, due to their complexity.
Because the process is more complex and resource-intensive during the transfer phase, ETL is typically more appropriate for organizations with data engineering professionals. In general, ETL costs more, but its customizability increases the chance that you conduct precisely the analysis you need.
ELT (Extract, Load, Transform) reverses the order of the last two steps, so that loading into the target source takes place prior to any transformation. A much more recent pipeline development than the legacy ETL, ELT is usually quicker and easier (and often cheaper), yet less adept with complex datasets and relationships. So, for example, standard pulls from Google AdWords, HubSpot or Facebook are appropriate for ELT, since they require very little data model customization (although some amount of transformation always takes place before loading, even with ELT).
Quick and easy loading then allows you to determine how data relates once it arrives at its destination. ELT is typically easier and less expensive than ETL, and it tends to be more appropriate for predictable, non-customizable data models. More complex data models, however, experience problems with ELT.
Whether your data currently resides on-prem or in the cloud, you need to choose a cloud storage solution that accommodates your corporate needs. While the options are nearly limitless, here are some guidelines.
To begin with, don’t conduct analysis off a production database. Instead, set up another location that replicates your data at a time interval you specify. The process of mirroring data introduces latency into your analysis, but it protects the production data store your customers depend on and can prevent a devastating system overload.
The best practice is to establish a dedicated data environment used exclusively for analysis. Storage that pulls double duty with production, as an application backend or for other functions, risks compromising both ongoing operations and analysis.
For large amounts of structured and unstructured data, a data lake (like Amazon S3) can provide a cost-effective storage solution. It offers plenty of space at an affordable price point. But a data lake typically suffers from an inability to perform queries, change computing power or conduct other fine-tuned operations. While this is changing somewhat as technology evolves (for example, Athena now allows users to query an S3 data lake through SQL), the more common method is transferring a select subset of data into your data warehouse to perform complex analysis.
In the end, data warehouses supply the best analytics environment — better than data lakes, standard databases or other possibilities. The three main data warehouse options are Amazon Redshift, Google BigQuery and Snowflake.
The first question you should ask about these options involves how each of them scales. Costs of storing your data will largely depend on the costs of scaling … and each warehouse scales in very different ways. Redshift bundles nodes and computing power, so you can’t increase one without paying for more of the other.
Snowflake, on the other hand, untethers nodes from computing power, so you can buy just what you need in each category. BigQuery tends to straddle the line, pricing for storage (active or long-term) and queries (on-demand or flat-rate). Understand your standard use case, and apply that to each of the three options to determine which is the best fit for you.
As you make your selection, you can take solace in this good news: migrations are much easier than they used to be, so a decision to use a certain warehousing solution today does not entail a yearslong commitment. Because data migrations are so much easier now than they once were, Sisense for Cloud Data Teams conducts quarterly reevaluations for our clients to help them decide whether they should migrate for better value.
Options for analysis have generally fallen into two categories.
The first basic option caters to engineers who code in native languages, comprehend the data models they’re working with and prefer a notebook-style environment. The ad hoc analysis they conduct is both more powerful and more difficult. Their tools, therefore, address data in the unopinionated, unfiltered form that’s most appropriate for complex data sets and constantly changing data models.
The second option is for BI users who don’t know SQL, R and Python. Instead, it provides helpful layers between the users and the data. These users depend on semantic processing and need an intuitive UX that masks data through a wizard, offering features like drag-and-drop, data dictionaries and drop-down lists.
This binary distinction is quickly melting away, however. A decade ago, coding in native languages was the domain of elite data experts. Today, SQL proficiency is commonly found in employees across multiple teams and levels of seniority.
Many companies find the best solution is one that straddles BI and data science. These hybrid tools allow technical experts, laypeople and everyone in between to access the same data and interact with it in the ways they prefer. Sisense for Cloud Data Teams’ own rapidly expanding niche is a solution that in this way includes all personas.
Consider who will consume the product — not simply who uses it. Choose the most relevant option for sharing data and insights among various users and consumers. The sharing capabilities you select should cater to three general categories of persona:
Traditional BI consumers and analysts, who want translation into more intuitive UX and static, production-level charts
Higher-level, semi-technical users, such as data analysts, who consume and build charts and may be somewhat proficient with SQL
Data experts who work in SQL, Python and R every day
Search for a data platform that allows each persona to accomplish key tasks and makes it easy to share insights discovered in their data analysis. Slack is the kingpin of collaboration tools now, but lesser-known competitors may be appropriate for your needs.
Step Three: Account for Performance and Experience
These factors may not directly drive your choice of tools, but reflecting on them can help guide your decision.
Although promises of “real-time” analysis abound in this industry, latency exists in all data analysis operations. Because faster generally equates to greater cost, set a wait threshold based on the real needs of your users. Then you can look to fine tune the two main sources of latency in your data analysis:
- Data latency is a function of how fast data is transferred from the source to your processing capability. It depends on the computing power of the source, your ETL/ELT tool and the layers between the data and the display
- Processing speed depends on computing power, which is governed by how fast and how often the ETL/ELT script runs, blends and conducts analysis
Time to Value
Once you decide on a solution, how long will it take to start seeing real value from it? Forget your assumptions. A full solution can deliver insights in hours, not weeks.
Some companies insist on time-intensive steps like setting up data definitions and semantic layers or instituting new data models. A far quicker and better way — which is also Sisense for Cloud Data Teams’ method — uses your existing data flows and data models for a native, unopinionated solution that installs directly and can run queries in minutes.
The Chaperone Experience
Even with the promise of machine learning, many of the best data analytics insights come from collaboration between human beings.
While a wizard can help BI users craft queries, interactions with an automated tool present risks as well. They’re helpful in many ways, but wizards can’t understand a data model and explain it to a BI analyst. Likewise, a wizard can’t comprehend the intent behind a BI request to make sure the script accomplishes the desired task. Wizards’ limitations can hamstring BI users or, worse, introduce the risk of catastrophic data misuse.
The best outcomes start with relationships between real humans, which we call the chaperone experience. In the chaperone experience, data experts collaborate with BI analysts to understand the root questions they want answered and why they’re important. The company isn’t left at the mercy of the data definition model or interfaces that can’t understand the BI relevance of the data. When real people talk through Slack, by email, face to face or by other method — the data experts can curate data and partner with BI experts for profound benefit.
This is a foundational element in Sisense for Cloud Data Teams’ offerings, which facilitate this kind of close partnership. For instance, a data engineer who appreciates the BI perspective can create a view for deep analysis, and publish the results as a dataset. The BI unit delivers an accurate, up-to-date and specific representation of the data needed to perform their analysis, since both parties are familiar and comfortable with the Sisense for Cloud Data Teams platform. The data can now be used to field business, analyst and scientist use cases simultaneously.
Any tool you select will need to be compliant with the necessary requirements for role-based access control (RBAC). It will allow you to set permissions and privileges in accordance with how you want to protect data in your organization.
The main question to answer is: how can we keep data secure and simultaneously available to the right people? Your answer determines who in the company gets to dig into what data, and in what order. It also directs how view-only and editing privileges are assigned.
As a trend, the democratization of data is powerful, but it can also be dangerous. Treating access haphazardly can make you vulnerable to illegal or corrupt activity, especially if your data includes individuals’ PII.
Your company may be screaming “We need data analysis now!” — but if they haven’t worked through these items, then they’re not ready for it.
And, to be fair, thinking through these steps is work. Data analysis isn’t one-size-fits-all. Selecting the right platform doesn’t happen quickly. It takes meticulous self-examination, careful research of current options and thorough engagement with experts in the field.
Throughout the process, talk with vendors. Discuss your process of discovery through this buyers’ guide. How well do their solutions match up with what you’ve discovered about your company and your requirements? In our experience, the happiest clients are those who have focused on their needs first, rather than on product features. Often, the best solution is composite, made of many discrete elements that work together.
We can talk you through this guide, and help you select a solution that fits your company best. We focus on your business needs (rather than on product features) to offer customized pipeline, storage, analysis and sharing solutions. With us, no one at your firm — from the most hard-core data scientist to the least technical BI analyst — gets left behind.
You’ll also find Sisense for Cloud Data Teams:
- Can adjust speed to your requirements
- Gets your solution set up in hours, not weeks
- Allows the chaperone experience for excellent inter-unit collaboration
- Facilitates permissions and other access controls
If you think hard about the questions posed in this guide, you’ll select a data analytics solution for the best possible reasons. The homework you’ve done will let you enjoy safe, reliable, affordable data analysis that’s accessible for your entire organization.
If you want to see how your team can use Sisense for Cloud Data Teams to increase the value of citizen data scientists, watch a demo or set up a free trial today.Watch a Demo Start a Trial