It will likely come as no surprise to most that almost every decision we make is driven by data. Often, presentations to our board have more charts than slides!
However, while we’ve built our platform to support and encourage self-serve data exploration, it is very important to remember that self-service analytics without the guidance of professional data teams is a recipe for trouble. As you’ll see below, the conclusion of a recent project only strengthened what we already knew:
Simple Question? Not So Fast
Earlier this year, I was working on a revenue segmentation analysis for a board meeting. The segmentation was new and didn’t yet exist in any model or on any dashboard, so I set off to create my own (using Sisense for Cloud Data Teams of course!)
I had sought to answer a simple question, how much money are we making in this new segment over time? I needed revenue and customer count by quarter in this segment. The query looked like this:
This query gets me quarterly new revenue in this segment. Since our product is a subscription, revenue is recurring and making the chart cumulative over time is a simple matter of checking the “cumulative” checkbox in the chart editor.
Simple enough, right? A simple, understandable query generated results that roughly matched my expectations — job well done! Time to insert the chart into the board deck and move on to the next analysis.
Nope. The analysis was completely wrong. But it looked right – which was the scary part.
I happened to use SQL, but this chart could have been made just as easily in a drag/drop self-serve interface. This wasn’t an issue of me needing to be more technical or more data savvy. I can self-serve. I’m the CTO of a data analysis company. I am very familiar with both SQL and our data architecture. I’m not, however, a data analyst. The problem wasn’t the interface — it was me!
Out of habit, I asked my data team to review my chart to make sure I hadn’t made any major mistakes. I had made three mistakes in 9 short lines of SQL! Did you catch them? Quarterly revenue cannot be this complicated, right?
Here are the mistakes I made:
- sum(first_monthly_revenue * 3): This gets me close to the quarterly revenue for each customer that quarter. However, it ignores plan upgrades or downgrades within the quarter. Likewise, using the “cumulative” checkbox assumes that customer is paying the same amount (their first month’s amount) for every quarter they are a customer, and that isn’t the case for most of our customers.
- is_paying=true: This filters all_customers to those who are paying right now. This query is looking at the customers table “as it is”, when I needed “as it was.” The effect of this mistake is that I’m excluding historical revenue from all customers who have churned.
- employee_count between X and Y: This is another case of “as it is” versus “as it was.” We update the all_customers model regularly with fresh data, including their employee count on LinkedIn. The effect of this on my analysis is that I’m filtering to customers who are currently in this segment. That might be what I wanted, or I could have wanted quarterly customers who were in that segment, at that time. Or maybe customers should be filtered out once they grew out of the segment. Unlike the previous two issues which were clearly data mistakes, this issue represents an definitional ambiguity that could have had an unknown affect on the analysis.
My data team noticed these mistakes almost immediately (of course they did – they are amazing at their jobs) and provided me a corrected query in under an hour.
Our Final Analysis
Supplying them with my incomplete analysis, formatted the way I wanted, made it easy for us to collaborate on the data and rapidly reach the correct conclusion. There was little back-and-forth, I didn’t need to explain what I wanted, and I didn’t need to wait for them to remodel something to make it easier for me to self-serve.
And in the spirit of preventing similar mistakes in the future, they then built a new model, all_customers_monthly. This new model makes it easy for me and others to do more analyses like the one I attempted earlier, without errors.
This is a clear illustration of the strength of agile data teams and models versus upfront data modeling. My data team creates and maintains only the models that are actually needed and can spend the rest of their time on novel analysis. With upfront modeling, they would have had to devote costly resources to building and maintaining models that might, at some point, be useful (but probably wouldn’t).
Here’s a look at the new all_customers_monthly model, so you can see how you might create models that prevent errors and enable answering similar questions:
Which makes the revised query even simpler:
This example is one of the many reasons we believe so much in empowering data teams. They are the professionals who enable the rest of the organization, from CTOs down to individual contributors, to make good decisions with correct data.
If you’ve had similar experiences with your data analysis, or are looking for new ways to understand your data, we’d love to hear from you!