Everything You Need to Know About Data Mining and BI
What is data mining?
Remember that scene in A Beautiful Mind where John Nash, played by Russell Crowe, stares at huge screens of numbers that totally baffle the CIA, and as he does, connections and patterns start to reveal themselves to him, piece by piece, until he cracks the code?
That’s essentially what data mining is – except, thankfully, you don’t have to do it in your head.
Put simply, data mining means applying different mathematical and statistical techniques to huge pools of data in repositories (i.e. Big Data), in order to tease out correlations and trends.
How has it changed over time?
The evolution of Big Data, and with it big data mining, has been one of perspective.
Initially, it was all about looking over our shoulders: we trawled through historical data to try and figure out how we got here, what worked in the past, what didn’t, and why.
This is great, but it has limited scope; while you can learn from your mistakes, circumstances change, and knowing what happened last month is nowhere near as useful as knowing what’s happening right now… or, even better, what’s likely to happen tomorrow.
Enter the age of data mining, which focuses instead on predictive data.
As data mining techniques have become more sophisticated, we’ve begun looking ahead, mapping out patterns into the future and applying what we’ve learned to the situations on the horizon, too.
Why is data mining useful?
Having a clear and accurate picture both of how you’re doing now and what’s just around the corner is obviously a massive benefit for any business.
Once you can judge with confidence what’s coming, you can plan for the best and prepare for the worst, redistributing or scaling up resources where needed, allocating budgets effectively, adapting things like order size and storage requirements, and generally positioning yourself, your team and your finances to be as streamlined and effective as you possibly can.
All of which, of course, has a huge effect on the success of your business – and your bottom line.
As Nir Regev, Lead Data Scientist at Sisense explains:
For any data mining process, you need to start with some kind of hypothesis, some kind of working assumptions, in order to know what to focus on. You can’t just collect every single piece of data in the organization – it’s inefficient. Collect data that only seems relevant to your assumptions.
Then, keep digging deep into the data with tools such as Python or R, to carry out EDA (Exploratory Data Analysis) and then you’ll be able extract insights and features from the data. Without a preliminary hypothesis with regard to some goal (e.g. reduce customer churn), you’ll just be lost.
How might that work in practice?
Let’s take the retail sector as an example.
Imagine you’re a retail fashion chain, collecting vast pools of data daily from interactions with suppliers and customers. All these terabytes are transferred into a sprawling data warehouse – but just having that information doesn’t do you much good. You have to mine it.
Using data mining techniques, you start to identify patterns in customer buying behavior. You generate a picture of your customers’ habits, for example the days they shop on and the products they go for. You see how behavioural trends shift from month to month, season to season.
Armed with this information, you begin to make predictions for the future, including detailed, accurate forecasts about which products will sell best, and when.
You start accurately planning the shift schedule for workers, and plan recruitment drives well ahead of time to find temporary staff, in order to cover busy days or times of year – without ending up short-handed or paying out for excessive overtime. You increase orders with long enough lead times to avoid extortionate late shipping fees, while avoiding over-ordering mistakes that lead to high warehousing costs for excess stock.
If you’re collecting data, all the answers you’re looking for are probably buried deep in that mess of numbers. Data mining is simply a way to dig them out. Let’s take another example: a cloud-hosted, SaaS technology provider whose platform is used by tens of thousands of businesses worldwide. In this case, you would generate huge pools of data daily on everything from sign-in times to user behaviour, errors flagged in the system, and so on.
By mining and collating data on crash reports, you would soon establish patterns in what causes problems in the platform, spotting bugs or weaknesses in the system, issues that occur when working on certain devices or operating systems, or which actions trigger problems the most. Using this information, you could continually improve your product offering, smoothing out glitches to guarantee uninterrupted service for customers – and protect against cyber threats.
What’s more, by getting a complete picture of factors like popular usage times and frequently used tools and features, you could a) figure out where to take your product next, b) market more effectively based on customer preferences, and c) predict when demand will be highest in each region, ensuring you allocate enough resources and servers to handle it.
Finding answers that are already there
Data mining solves problems by analyzing data you already have in your database… but it still takes the right skills, tools and insights to frame the questions right, and to make sense of it all.
Take a commonly asked question: How do I improve customer loyalty in a highly competitive marketplace?
The first thing you could do is figure out what a fickle customer looks like. By mining all the data you have on customer behavior and characteristics, you can build a general picture of what constitutes a customer who changes products, as opposed to a loyal one, including their distinguishing characteristics.
Once you know, you can use this to identify potentially fickle customers. You can then preempt the problem by finding ways to sustain their interest before their eyes wander. You might target them for special treatment, product features or offers that may be too expensive to roll out across the board, but are worth it when applied exclusively to this group.
Data by itself isn’t enough; it’s the raw material that needs to be crafted to solve the big problems yourself.
The tools you need
Data mining typically requires a powerful programming language like Python or R; the latter is a hugely popular, open-source option used for complex statistics and data analysis on mammoth data sets.
The best results come when you combine this programming language with your BI platform – Sisense, for example, offers full integration with R. This allows you to mash up multiple data sets and prepare your data within your BI platform, transmit that to R for analysis and then bring this back into the BI solution to visualize the results. Combined, it’s a slick and powerful way to drive your predictive analytics.
What’s next for data mining?
As datasets balloon, data mining gets exponentially more complicated. In the future, you’ll use an increasing number of data sources. You’ll collect more raw data, drawn from more devices and geographies, in more formats. There’ll be more work to do in ordering, harmonizing and amalgamating your data before you can even begin to mine it.
At the same time, the scope for finding insights will be ever greater. So long as you can figure out how to mine and manipulate it effectively, you’ll be able to learn more about your business than ever before, helping you to understand precisely how each piece fits together – and how every action and business decision feeds into the mix.
The ways you interact with your data, too, will become more varied and intelligent, encompassing natural speech, bots and machine learning features that predict what you need and automate huge chunks of the process.
This is why, too, your data mining efforts will need to be supported by a BI tool that’s capable of handling this scope and demand. You’ll need a solution that’s fast and scalable enough to cope with however much data you throw at it, and flexible enough to add whatever data sources you want. One that is able to support new developments shaping the data mining landscape, from machine learning to Natural Language Processing (NLP).
To thrive in an evermore competitive world, you not only need to mine for patterns and convert them into actionable insights, you need to get better, faster, and smarter at it, too.
Want to see for yourself how Sisense works?Start Free Trial