To understand how businesses in the future will create the most value with data, it helps to take a look back at the preceding waves of innovation that have shaped the space. The case of business data takes us all the way back to the ‘70s and the introduction of databases, which at the time were very expensive and extremely limited in functionality. In the ‘80s, Business Intelligence was created, which gave companies a brand new way to look at information, even if it was limited to a small set of specific information in those databases.
This combination of databases and BI was so valuable that it remained relatively unchanged for most of the next 20 years. What did change is that computers become more prevalent in the workplace, appearing on every desk in every office of companies around the world. Shortly after, the next big wave of innovation in BI analytics appeared: drag-and-drop functionality was introduced to the landscape, making BI accessible to nearly every business user.
In the early 2000s, drag-and-drop led to an enormous scale of new people asking basic questions about the simple, limited information that was stored on those early databases. This setup was fantastic for companies with static data needs, but the ubiquity of the internet introduced a series of changes that fundamentally altered the industry. In the next wave of innovation, larger storage and drastically cheaper compute led to the creation of complex new datasets that couldn’t be analyzed easily using standard BI and traditional data models. These changes led us into the modern era of data analysis, where it’s possible to ask hard questions of complex data, but only if you upgrade your tools and processes.
What’s possible today
The universe of questions that data professionals can answer with complex data is infinite — too large for mere drag-and-drop BI to be effective. The abilities of those teams to tackle new questions and add new value to their organizations is something that can be measured on a spectrum of Data Maturity. The idea of Data Maturity is that companies can gain more value from their data by improving the technology, process, and personnel used in analysis.
Organizations that are new to data analysis can focus on simple tasks like collecting all their data into a single source of truth that updates on its own. From there, companies can build a BI engine to transform their complex data using a more democratic, drag-and-drop approach. This produces a huge increase in data value. Once their BI is set up, teams can choose to complement that functionality with a series of advanced analysis options that will take them into the later stages of Data Maturity.
In earlier stages, SQL can do wonders for businesses answering these new types of questions. For basic reporting, you don’t need fancier technology or languages than SQL, you can rely on that language all the time. As you get more advanced, you have to supplement that SQL with Python or R, languages that excel at simplifying complicated data processes. Those languages have libraries and packages that allow you to ask new questions with data, focusing more on predicting the future and less on describing the past.
SQL vs. Python and R
An easy way to understand the difference between SQL and Python/R is to consider the task of painting an empty room. SQL is like the rollers, streamlining a large part of the work — this is how you’ll cover a majority of the space — but they aren’t sufficient for every part of the job. You’re not using rollers to get into the corners, finish the trim around the windows, or detail other parts of the room that require precision. SQL is not nuanced, it’s a straightforward, blunt language that lays the foundation for your analysis. It is great at simple analyses, but falls short on more complex ones. A good data analyst knows exactly what SQL is intended to do and uses it for exactly those things.
Python and R are much better suited for the nuanced parts of the project. If SQL is the paint rollers, Python and R are like the sets of brushes, reserved for the parts of the job that require a fine touch. In the same way that there are many sizes and shapes of brushes to suit different painting tasks, there are many Python libraries and R packages built for any specific data analysis you want to run.
The ideal workflow uses each language for what it’s designed to do. You start by doing the general data preparation in SQL, then you pass the data into Python or R for the specific analysis you’re trying to run. This is the data analysis equivalent of doing the bulk of the painting with a roller and polishing off the edges and the more sensitive areas by hand with a brush. You’ll finish the job completely, utilizing time and resources as efficiently as possible. As you paint more rooms, none of them will be exactly the same, but you can use the same method to duplicate your good results.
Build infinite new things
The beautiful part of using Python and R on top of your SQL analysis is that a skilled data creator can make so many new things. SQL has limits, and R and Python can take your data projects in an unlimited number of new directions, including certain advanced statistical analysis, data cleaning, complex visualizations, natural language processing, machine learning prep, and a whole lot more. As long as you know when and how to use each language, your analysis capabilities are limitless.
These languages add new dimensions to the basic building of SQL. Python and R are dynamic, adding new packages and libraries to tackle new problems all the time. A platform that allows a team to use a combination of data languages is uniquely positioned to produce incredible value for the data builders at the helm.