What is dark data?

Dark data is the unmanaged information that an organization gathers during the course of business operations, but does not have plans to use for analytics or driving revenue. This could include archived email correspondence, historical records, surveillance footage, web logs and visitor tracking data, or even paper files. A recent survey estimated that on a global level, 55% of companies’ data is dark. The makeup of the data, not to mention its potential, is often completely unknown. 

The rise of big data has contributed to the ever-increasing volumes of dark data, as organizations collect and store more and more data from as many sources as possible, but don’t have a plan to process and analyze it, or enough resources to devote to the project. Though dark data may not have an obvious use or benefit to the company in the near future, it’s almost certainly an important resource for every company. Technical advancements like artificial intelligence have made mining dark data easier. Or, a pivot in the focus or strategy of a company might require analysis of different data sources than have been explored in the past. 

Unused dark data may also have implications in governance and compliance areas. Even though this data is undigitized and untapped, it can still be exposed in a corporate breach. Another challenge is the difficulty of accessing unorganized information in a huge dark data lake, if a customer requests that their personal data be deleted. 

Where is dark data generated from?

More than a decade ago, companies started realizing that their data held huge potential for the future, either by driving revenue with it or using it to answer questions and drive smarter decision-making. There was a push to invest in the infrastructure to collect and store the data that was being automatically generated from many different sources.

There are three general ways that dark data is generated, across all industries and company sizes: 

Untapped internal data: This is information that an organization collects during regular business operations, stores in a dark data lake, and is not used to its full potential. This data remains untapped for several reasons. A company could be overwhelmed by the sheer volume of information or lack the proper business intelligence tools to organize it. Or perhaps the data is of poor quality and the company doesn’t have the resources to clean and prepare it for use.

Nontraditional unstructured data: This is commonly media data, or digital exhaust, the metadata related to audio, video, and image files, as well as social media accounts. Getting value out of this enormous volume of information requires a lot of computing power and experienced data analysts. 
Deep web data: Personal Information, like credit card and transaction data, that is typically classified and protected by a firewall or similar barrier. This kind of data is more difficult to view and catalog, and may require specialized software to collect and analyze it.

What can you do with dark data?

Organizations may consider this data too outdated, redundant, incomplete, or difficult to access to be worth the time and resources to leverage insights out of it. Sometimes, they don’t even know it exists. Although the data is difficult to access or mired in a dark lake, your organization can still use it for analysis purposes, to answer valuable questions that can benefit your business operations.

Some of the ways companies are taking advantage of this asset is by comparing their dark data reserves to publicly available or acquired data. There are some ingenious use cases, involving mining historic data that was collected during the course of publicly-funded research that took place in the recent past or even decades ago. This historic data can be compared to present-day statistics to analyze changes in global temperature and climate, pollution amounts, and other environmental and chemical factors. 

Many examples of dark data analysis can be performed just by digging into existing logs and archives. Say a customer visits your site to sign up for an email newsletter, perhaps because you’ve tempted them with a giveaway or contest. The form records their email address, name, etc., but a deeper dive into the web stats will reveal what website they came from, where they went afterwards, and what other pages they clicked on in your website. That data will allow you to tailor the marketing message to that customer’s specific interests.

Sisense for BI & Analytics Teams

Is dark data good for analysis?

Most senior IT and business leaders agree that dark data contains far too much potential to be left in the shadows. Using a BI platform like Sisense represents the best strategy to fully leverage these assets. Its AI-powered analytics will simplify your complex data and translate it into dashboards and other visualizations to create a single source of truth for the entire company. 

The first step to getting value out of dark data is usually to classify and organize the data. It’s always a good move to create a data catalog to inventory data so everyone can see what there is and what potential it has. The amount of accumulated dark data can quickly become overwhelming, so regular pruning is a good best practice to engage in regularly. However, discarded data should not be deleted. Cloud storage is relatively inexpensive, and as long as it’s properly encrypted, hanging onto this type of data should not create additional liabilities. 

What you’re left with should hold plenty of opportunities for analysis, either for immediate use or to answer questions in the future. Companies that can figure out the best ways to illuminate dark data can expect to better predict trends and patterns.

Start Free Trial Back to Glossary