If you've tried to conduct a conversation with Amazon Alexa, you know just how stilted it can be. "Alexa, will you..." followed by "I don't understand what you mean." The Holy Grail, of course, would be to carry on a multi-faceted conversation, with Alexa responding to a command like "play me some music" with a question of which kind, how loud, etc.
Such a voice-driven future is just that, however: The future. Why? Well, it turns out that Alexa plays dumb on purpose.
What do you mean?
At least, that's what Adam Radziszewski, a machine learning and natural language processing (NLP) expert with Infermedica, argues. Alexa, Siri, and other voice assistants are hampered, in part, by the difficulty in discovering just what their capabilities may be. According to Radziszewski, however, there are far more foundational issues with their mechanics.
Amazon, for starters, doesn't want users trying to converse with Alexa. Given the state of NLP and conversational voice UI, this makes sense. On the Alexa developer portal, most (90%) of the example apps are simple command-style interfaces, with just one conversational UI. Why? Radziszewski has posited, "Amazon may have anticipated that giving developers too much freedom would result in a constant lack of understanding, which, at the end of the day, would be blamed on Alexa itself."
SEE: IT leader's guide to the future of artificial intelligence (Tech Pro Research)
For a developer working with Alexa, Radziszewski said, the framework rigidly steers developers away from attempting conversations:
Alexa's interaction model is not conversation-friendly. In Alexa's terms, this sort of interaction may be achieved using the "custom skill" type. However, if you're expecting to find a convenient framework for implementing a custom chat, you'll be disappointed. Even when using a "custom skill", you're forced to enumerate a fixed set of intents that you allow the user to express. An intent might be a desire to order coffee or find movie titles. Confirmation (saying "yes" in response to a question) and denial ("no") are also intents. Some intents may have slots for parameters, such as a city name or type of coffee. You have to specify the type of each slot; you can either use one of the predefined types, or you need to - you guessed it - enumerate a fixed set of values. It's hard to imagine a casual chat having this level of rigor.
Alexa's struggles to understand could be improved if developers could apply domain-specific nomenclature, but "Alexa offers just one general-purpose speech recognizer with no way of injecting extra-linguistic context," Radziszewski said.
A huge mountain to climb
None of which is meant to be critical of Amazon, but rather to acknowledge just how hard a task it has set itself. We're nowhere near proficiency with general AI that has the ability to broadly "understand" language and respond accordingly. Amazon is therefore wise to corral developers into simpler modes of verbal engagement.
Yes, there are ways to trick the Alexa development framework, Radziszewski has highlighted. "What you need to do is force Alexa into thinking that: 1) Whatever the user says fits into a one-and-only catch-it-all intent. 2)This intent itself is one huge slot of the AMAZON.LITERAL type," he said, but this approach is strongly discouraged by Amazon.
What is encouraged is patience and coloring inside the Alexa lines until the NLP and machine learning can catch up to more advanced user expectations. Until then, we're going to have to get comfortable with pretty basic interaction with our voice bots which, to be clear, is pretty darn amazing in itself.
- How to become an Alexa developer: The smart person's guide (TechRepublic)
- Alexa and Google Home's dirty little secret: 97% of voice apps are only used for one week (TechRepublic)
- Why an app-focused strategy could lead to mobile failure (TechRepublic)
- 10 Amazon Alexa skills to add to your Echo today (TechRepublic)
- Alexa tricks: From helpful to amusing, here are 25 things to ask your assistant (ZDNet)
Matt is currently head of the developer ecosystem at Adobe. The views expressed are his own, not those of his employer.
Matt Asay is a veteran technology columnist who has written for CNET, ReadWrite, and other tech media. Asay has also held a variety of executive roles with leading mobile and big data software companies.