Understanding the Landscape of Natural Language Technologies
By: Nate Nichols
It’s impossible to know for sure, but anthropologists generally believe that modern humans evolved roughly 200,000 – 300,000 years ago. Human language, according to many linguists, originated roughly 100,000 years ago. If it took us 100,000 – 200,000 years to develop language, perhaps it’s not surprising that computers have historically struggled with human language since the earliest days of computing.
Fortunately, this is beginning to change. Computers are becoming increasingly facile with human language (called “natural language” in technologists’ jargon). Just like how humans interact with language via a variety of mechanisms (speaking, reading, writing, etc.), computers also use a variety of techniques to deal with natural language, all of which fall under the broad umbrella of Artificial Intelligence (AI). Four of these techniques are particularly close in relation and easy to confuse:
- Natural Language Generation (NLG) is when a computer produces human language from a set of facts or data it wishes to express (e.g. producing a one paragraph weather forecast from a weather prediction model).
- Natural Language Processing (NLP) is when a computer produces data from human language (e.g. assessing the likelihood of an email being spam).
- Natural Language Querying (NLQ) is what happens when a computer turns human language into a database query. (e.g.: being able to type into a search box “What are my sales for the last six months?” and getting back a bar chart.)
- Natural Language Understanding (NLU) is when a computer genuinely understands a piece of human language. This is a hard problem, but we’re starting to see some progress here, particularly in “virtual assistants” such as Alexa, Siri, Cortana, Amy@x.ai, etc.
It’s important to understand that these aren’t mutually exclusive approaches, because they’re each focused on different aspects of how computers can interact with human language.
To show how these different techniques can be used together, let’s start with a simple example and layer on the various natural language techniques.
Let’s say you work in the corporate office of a nationwide fast-food chain, with franchises across the company. Each of the franchises has a lot of Yelp reviews, and some of the franchises are rated higher than others. You want to know why some franchises score higher while others don’t.
Natural Language Processing
You could start to tackle this problem by using Natural Language Processing to pull out the words and phrases that typically only appear in positive or negative reviews. Using this approach, you would generate a database that you could query to learn things like “slow service” or “cold food” appeared in a large number of negative reviews, or “clean floors” and “friendly cashiers” appeared in a large number of positive reviews. You could also use this database to build a dashboard (via a platform like Tableau or Qlik) that would let you explore this data further.
Natural Language Query
Now that you’ve got this interesting data, you may want to expose it to users via a Natural Language Query interface (perhaps using a tool like ThoughtSpot). With an NLQ interface, a novice user could ask questions like “What were the most commonly used negative keywords in New England?” and receive a helpful bar chart. NLQ lowers the bar for interacting with an analytics platform by reducing the technical and analytical expertise necessary to produce visualizations of your data.
Natural Language Generation
While visualizations certainly have a place in communication, they also require skills to interpret and extract value from them. Franchise managers don’t want a complex dashboard of positive and negative terms used in reviews of their franchise; they care about increasing their profit, not clicking and filtering a chart. Empowering busy people to make impactful, data-driven decisions is exactly the kind of situation where Natural Language Generation really shines. Imagine if, instead of receiving a link to a dashboard, each franchise manager gets a weekly report that includes language like:
“Last week, you received six negative reviews, four of which mentioned ’slow service.’ Nationwide, franchises that have been labeled as providing ‘slow service’ typically earn 8% less in profits each week than other franchises. Consider working with your employees and stressing the importance of fast, friendly service to improve your financials in the future.”
Natural Language Understanding
Finally, you could provide those kinds of insight via Natural Language Understanding in a conversational context. In contrast to NLQ, which requires users to ask the right questions, an NLU-based system would genuinely understand the user’s intent and be able to answer the “question behind the question” via NLG. For instance, an NLQ system would translate a question like “Who were the top three salespeople in 2017?” into a database query and provide a bar chart highlighting the top three salespeople.
In contrast, an NLU system would understand the intent behind the person’s question. Such a system would then be able to perform a variety of analytics against the data to truly comprehend the situation in order to surface the most valuable and relevant insights back to the user. For instance, the system might use NLG to respond with something like “Mark Jones remains your top salesperson, with $480k in sales last year. But the real story is Carmela Vasquez, who improved to $440k in sales in 2017, up from only $120k in 2016…”
Hopefully, this example shows how different natural language technologies can work together, and the power of conversational interactions backed by NLU and NLG technologies in particular. It’s certainly an exciting time to be working at Narrative Science and helping push these technologies forward!