Understanding Augmented Analytics
I attended Gartner’s Data & Analytics Summit 2018 in Grapevine, Texas, where topics ranging from data governance best practices to digital ethics permeated the keynotes and decorated the vendor stalls. I attended as Narrative Science’s engineering representative and so had the opportunity to field a steady stream of engaging questions about how our extensions help decision-makers get to insight faster. A major theme of these questions was how our company uses augmented analytics – a term Gartner defines as the “[use of] machine learning to automate data preparation, insight discovery and insight sharing for a broad range of business users, operational workers, and citizen data scientists.”
At first glance, this definition didn’t sound particularly revelatory to me; in fact, the definition aligns squarely with the goal of Narrative Science. Researching the topic of augmented analytics after the conference, however, made clear to me that tangible examples of the technology in action are hard to come by. Combing through thought pieces and industry reports left me with more questions than answers: what types of automation do augmented analytics apply? How are these insights being discovered and shared? And – most importantly for me – what are the actual analytics that are being employed?
Diving into real-world examples
I can at least start with what I know best. Having led the work on the analytic engine powering our extensions for over three years, I can offer a peek behind the curtain at some of the underlying algorithms that power the content our augmented analytics product produces. As a refresher, our intelligent automation platform provides natural language generation (NLG) functionality that integrates into the top Business Intelligence (BI) tools. These extensions analyze and create insightful stories about business data, all delivered instantly and without a tedious configuration requiring a deep analytical skillset.
To illustrate this point, let’s dive more deeply into two sets of analytics that we run to see how they impact our stories and provide the most insightful content to our readers.
Regression Analysis for Drivers
One of the major promises of BI is the ability to not just improve understanding of data, but to use that understanding to drive better business decisions. Thus the work isn’t done until the insights are accessible and actionable for the reader. In order for insight to be accessible and actionable, more upfront work is often required – for instance, building sophisticated mathematical models. This process typically involves many time-consuming steps: initial model creation, subsequent tuning/validation, and eventual productization.
Imagine being a small business owner with years worth of data:
The data includes a critical measure we’d like to better understand (TripAdvisor Score) as well as many others that likely contribute to our critical measure. With few resources and limited data analysis experience, building a model may not be feasible. Furthermore, a model is worthless if the resulting insights are not easily understood by the people who need them.
Luckily, our extensions can both build the model and describe the results. Given the above dataset, our analysis will automatically produce text like the following:
As Customer Service increased, TripAdvisor Score increased based on the data provided. Specifically, when Customer Service increased by 10, TripAdvisor Score increased 3.27. There may be other factors contributing to TripAdvisor Score, but there is evidence of a very strong relationship.
Under the hood, the product is performing a multivariate regression analysis, running a suite of diagnostic tests, and evaluating the model, but the business owner doesn’t need to know that – he or she can jump right to the insights.
Another valuable facet of data analysis related to time series data is determining whether or not the movement of values exhibits a repeated regular pattern and, if so, what the characteristics of that pattern are. Examples might include seasonal changes in temperature or electricity usage in a typical household over time. What’s important here is not just the knowledge that these values exhibit periodic behavior, but also the time period over which these values vary.
Consider this Chicago public transit ridership data (taken from the Chicago Open Data Portal). Our products take this data and produce the following piece of content:
Ridership experienced cyclicity, repeating each cycle about every 6.99 days. There was also a pattern of smaller cycles that repeated about every 3.5 days.
Even though the results are easily understood, there’s a lot of complex working done behind the scenes: data detrending, spectral analysis, and model calibration. And while both of these examples are based on static datasets, it’s important to note that our extensions are truly dynamic – by default, our suite of analytics requires no custom configuration, with stories that respond instantly to any filters and selections applied to the data.
Why Augmented Analytics?
In building augmented analytic products and working with best-in-class BI partners, I’ve found there are two facets of Augmented Analytics that add the most value to an organization – data literacy and data democratization.
As the previous examples show, while the underlying algorithms and procedures are inherently technical, the output language is straightforward; by design, it can be understood by anyone. This inherent accessibility promotes data literacy – it empowers less technical readers to make better sense of the data and thereby better business decisions. It frees those readers to do what they do best – make decisions – by minimizing the time they need to spend interpreting insights.
Similarly, readers of these analyses coming from a data science or technical background can use augmented analytics to quickly test theories and get a sense of their data, all before having to write the code to build the models themselves. In this way, the stories can benefit all types of users and grant everyone useful access to the data – in other words, data democratization.
Taken together, these benefits amount to a technology trend that’s exceedingly important, and one I’m proud to say Narrative Science is a part of.
Feel free to reach out to me on Twitter: @alsip89.