In my previous blog, I wrote about Natural Language Query (NLQ, or search analytics for some), as one of the major topics that we, the AI group in Sisense, are working on. NLQ is one of the oldest AI disciplines, but we’ve only recently started hearing about it in conjunction with BI and analytics. In this blog, I would like to expand on NLQ and discuss how this AI technology can be leveraged in our domain.
NLQ serves those users who are in a rush, or who lack the skills or permissions to model their data using visualization tools or code editors. So if we think about data access as a function of technical skills and the time it takes to get an answer, NLQ will be the first technology that users will turn to when looking for insights.
Two Experiences for One NLQ
There are usually two possible intentions that drive the first question a user asks when entering a system or app. Take Spotify, for instance. When the app is first opened, the user may be searching for a specific song that was heard while passing by the neighborhood cafe, or the user may want to be surprised with, let’s say, a song from the new experimental album by a Yemen Reggae folk artist.
NLQ can serve both of those experiences using an analytic moment or an exploration mode. Ideally, the NLQ engine will detect the intent of the user, enter into one of these experiences based on the first question, and then facilitate the path accordingly.
In an analytic moment, the user is focused on finding the answer to a specific question. The NLQ functions on an ‘ask a specific question, get a specific answer’ process. We find, in general, that more than 60% of the searches are around “analytic moments,” or ad-hoc searches, based primarily through OEM consumption and APIs.
On the opposite end of the NLQ experience, we have exploration mode. In this mode, we find that the user has a vague idea of what they want to explore. Usually, in this mode, the user runs a search around a general topic, saves the intermediate results, and splits these results into a few different directions. Let’s go back to our Spotify example. The user will start off with a broad search topic like “folk music” for instance. The results will come back with folk music from around the world, and with no particular direction in mind, the user will begin to drill down into countries, then local streams of interest, and finally land on a Yemen Reggae folk artist. In this mode, the user avoids putting too much effort into the definition of a specific search, and instead, relies on a random exploration path with the assisted exploration of NLQ.
The Challenges of Getting NLQ Up and Running
For new vendors in the analytics market, one of the most obvious challenges is the absence of historical data. Historical data is needed to create a recommendation system that supports the NLQ exploration mode. Without historical data, facilitating longer NLQ journeys in exploration mode will be somewhat limited at first.
An additional challenge among BI users is domain-specific lingo. Imagine a marine freight company using Captain Cook slang to refer to distances (fathom), weights (draft), and types of goods (treasures) being shipped across oceans. How would the user get the right answer if the question isn’t phrased right in the first place? No need to batten down the hatches, matey, as any user can create a custom dictionary that will then be used in the NLQ process.
Last, and still a very painful challenge for most users, is the familiarity with the underlying data and data model. In other words, how the variables are named, and the granularity of their values. Implementing synonyms helps solve the first issue (how the variables are named), and support of multiple arithmetic and comparison functions will address the second issue (granularity of their values). Once both issues are addressed, the user can ask “how many customers are responsible for 80% of my Q1 2018 income compared to 2017?” and the system will know to look after ‘clients’ and aggregate the ‘revenue’ (the actual variable names in the system) to compare between Q1 2018 and Q1 2017.
Machine Intent vs. User Intent
What if the NLQ engine mistakenly took the question above to mean “how many customers are responsible for 80% of my Q1 2018 income compared to 2017?” when the user actually meant to compare between Q1 2018 to the whole of 2017?
What we need here is the ability to edit the intent detected by the NLQ and replace it with the intent of the user. This is called “active learning,” and it allows a faster optimization curve for a more personalized query, which over time will become an inherent part of the NLQ engine.
Using AI to its Fullest
NLQ is gaining traction in the big data analytics tools domain for its quick answers and ease of use. By using two very distinct experiences, analytic moment and exploration mode, NLQ accurately serves a wide range of queries and skillsets making it the go-to AI technology for many analysts and business users. As for initiating NLQ in analytics, even though there are still challenges with historical data, there are solutions that can help. Next time, we can discuss one of these solutions — recommendation system for context-based autocomplete — bringing AI one step closer to reaching its full potential in analytics.
There are many activities going on with AI today, from experimental to actual use cases. In our AI group, we are making major strides towards the future of AI by being able to tie the lineage, from ingestion through preparation to analytics, all on metadata (and to semantics in the future), integrating AI into each aspect of the BI lifecycle. I’m lucky to be part of this fast-paced AI exploration and to see how our customers are using our developments.