Have you recently had a frustrating or confusing conversation with a computer?

It’s silly to admit, but once in while I find myself screaming at Alexa (i.e., the persona of my Amazon Echo) because I don’t like what she’s telling me. Rationally, I know it’s not her fault that my vacation plans are probably ruined by impending thunderstorms, but it’s easy to get caught up in the role-play when your computer is talking to you like a human. For artificial intelligence architects, this is known as Natural Language Generation or NLG.

A bigger challenge for solution developers embracing NLG in their designs is communicating in the best way for their customers to appreciate and understand. With the recent advances in NLG technology and subsequent applied technology like Alexa, customers are expecting more than just any response–they’re expecting an intelligently articulated response that’s both comprehensible and stimulating.

I feel solution designers are missing the mark here. If your next product or service involves having a conversation with customers, make sure your system is talking at their level–not above or below it.

SEE: Special report: How to implement AI and machine learning (free PDF) (TechRepublic)

Competitive strategy: Start simple

Readability is the industry term that’s commonly used to gauge whether your text is appropriate for your audience, though I prefer the term understandability. Most people can read and even pronounce the word gallimaufry, but not many people understand what it means without context (it means a jumble or hodgepodge). Your real goal in NLG is ease of comprehension, but we’ll stick with readability for now since that’s the lexicon that’s been adopted.

Don’t build an NLG engine without a readability subcomponent. Before your system “says” anything, it should know the readability of the language it’s about to produce. Granted, the more direct applicability of a readability test is when written text is produced; however, it’s a good proxy for spoken language, as most people subvocalize when they read (i.e., they say the words in their head as they’re reading).

You can be very competitive with even a simple readability test. The most widely used and recognized readability tests are the two Flesch-Kincaid tests: one scores reading ease and the other scores the US grade level appropriate for the text. For most intents and purposes, the tests are synonymous. You’ll find these scores all over the place, including most Microsoft Office applications. The Flesch-Kincaid tests use the ratio of words to sentences and the ratio of syllables to words to produce its scores. An even simpler test is the Coleman-Liau test, which uses the ratio of letters to words instead of syllables to words.

Any of these tests can quickly and easily be plugged into your NLG engine to improve the chances that your audience understands what your system is trying to say. If you want your solution to be more distinctive in the marketplace, you should look to something that’s more sophisticated than Flesch-Kincaid or Coleman-Liau.

SEE: Satya Nadella: Software bots will be as big as mobile apps (TechRepublic)

Distinctive strategy: Are you smarter than a fourth grader?

The obvious shortcoming with these methods is that they gauge complexity by either characters or syllables per word. There are many words that are polysyllabic (three syllables or more) that are easy to understand: distinctive, marketplace, and understand are three examples I just used. There are also monosyllabic (one syllable) words–like daft or dolt–that many people don’t know ( daft means silly, and a dolt is an idiot). To compensate, there are a number of readability tests that categorize words based on their complexity or difficulty.

On the surface, the Gunning fog index seems to be heading in this direction by introducing the concept of complex words; and yet, when you dig a little deeper, you find that it equates complex to any polysyllabic word. So we’re back to the same problem we had before! Why did I bring this up? Because I like where Gunning fog is heading with the idea of a complex word–I just don’t like their definition.

Edgar Dale and Jeanne Chall created a better method in the late 1940s, and it’s aptly named the Dale-Chall readability score. Dale and Chall recognized the limitations of the Flesch-Kincaid readability test and improved it with the concept of difficult words. Their definition of a difficult word is any word a fourth-grade student is not familiar with. They originally created a set of 763 words that were familiar to fourth graders at the time, and anything outside of this set was considered difficult. The list of familiar words was expanded to 3,000 in 1995 (way to go fourth graders!). This is a pretty good plug-and-play test for your NLG engine, but since we’re data scientists, we can do a lot better.

SEE: IT leader’s guide to the future of artificial intelligence (Tech Pro Research)

Breakthrough strategy: A custom-tailored approach

To create a breakthrough solution, you should extend the Dale-Chall idea into a custom-tailored score for your customers. Basing your readability score on education level is a reasonable proxy for most circumstances, but it has a fatal flaw if your customer base spans multiple education levels; you’ll be forced to cater to the lowest education level in your target customer base.

The Oregon Department of Administrative Services instructs its writers to write at a 10th-grade reading level, especially when addressing governors, legislators, and news people (this is the minimum standard that’s recommended). I’m quite sure many in this audience are educated well past 10th grade, but the popular wisdom is to dumb down your language so it’s easier to read and understand. I wholeheartedly disagree. I enjoy and appreciate college-level texts much more than anything that’s been dumbed down.

Instead, you should understand what your target audience is talking about and create your own score. Scan, collect, and organize the exact words that your target audience is using now. With a brute-force, big data approach, you won’t even have to vet this with your target audience–just ensure a very large sample that accurately represents the discussions your target audience is currently having. You should use this to build your set of familiar words and, by inversion, you can determine the number of difficult words your solution is about to utter. Then use advanced math or a learning algorithm to figure out the rest of the formula, and voila–your scoring algorithm is alive and functional.

SEE: Hilary Mason: Use data science and machine intelligence to build a better future (TechRepublic)


The rise in popularity of computers and other systems that can hold a conversation with humans has forced data scientists into a position of ensuring that the conversation is both understood and appreciated. If your system talks over customers’ heads, they’ll be lost; if it talks under customers’ education levels, they won’t appreciate or respect it. Your goal is to design a system that talks directly at their level.

To do this, you must know the discussions your target customers are having and use that information to build a filter for your NLG system that will prevent it from saying the wrong thing. This is one area where computers can really outshine humans.

Don’t you wish your colleagues could apply a readability test in real time when talking to you?