Trust the marketing department to screw up the voice revolution. Even as Gartner tells us that 30% of browsing will be voice-driven by 2020, the experience of voice is fraught with peril. That is, the peril of annoying consumers to the point that they unplug their Teddy Ruxpin-esque talking gadgets and revert to a monastic computing experience.

Speak friend

One prophet of the so-called “voice revolution,” Brian Roemmele, argues that the reason we still type on keyboards has little to do with efficiency, and everything to do with the paucity of our underlying systems: “The fundamental reason humans have been reduced to tapping on keyboards made of glass is simply because the computer was not powerful enough to understand our words let alone even begin to decode our intent.”

As computers have become more powerful and better able to grok human speech, however, the potential for voice-driven computing should skyrocket.

SEE: IT leader’s guide to the future of artificial intelligence (Tech Pro Research)

“Should” is the key word in that sentence. For anyone who has barked commands at Apple’s Siri or tried desperately to get Alexa to turn off a timer, the hype of voice and the reality still diverge dramatically. I’m an avid backcountry skier and often ruin the tranquility of the morning mountain air with increasingly loud demands that Siri read or respond to texts as I scale a mountain. Unless my phone is out of my pocket, right next to my mouth (with a full cellular signal), Siri is deaf to my pleas. Amazon’s Alexa is much better from this perspective, able to hear me across our sometimes noisy kitchen, but even she is somewhat tone deaf, able to process only basic commands.

However, the biggest problem with voice isn’t that it sometimes doesn’t work as advertised, but rather what would happen if it did.

Babel away

The problem is that voice has the potential to engender not just one conversation with a brand, but many. If you think this seems like a good thing, Gartner analyst Augie Ray has an ugly reminder for you: “Let’s stop for a moment to recall how marketers excited about email and social media behaved, flooding your inbox and newsfeed with ads. Now imagine every brand in your life getting access to your Alexa or Home.”

Sound terrifying? It should, and not merely because of the annoyance factor, as Ray wrote:

Is that our voice future? Needing a voice spam filter? Having to tell Alexa to shut up? Walking around our house afraid to speak because anything we say could be turned into a marketing opportunity? Or worse? (“Man, do I feel hungover. That party really burned down the house!” Alexa responds, “I regret to inform you that your homeowner’s policy was canceled and your health care premium just rose 15%.”)

Arguably, some voice assistants will be better than others in this regard. Apple, for example, has shown restraint in its software services, refraining from obvious breaches of consumer privacy in the interest of data-driven advertising. Amazon and Google, however, are set up to a) sell and b) advertise, respectively, and it’s hard to see them subduing the impulse to do what their entire companies are constructed to pursue.

SEE: Special report: How to implement AI and machine learning (free PDF) (TechRepublic)

Of course, the real problem in these voice channels isn’t Amazon or Google, but rather the brands that genuinely want to do right by their customers.

“A personalized offer in the right channel at the right time can never be wrong, can it? (Switch the perspective to see things from the customer’s perspective and ask if 500 or more brands all speaking the right offer at the right time on a Google Home speaker is a desirable situation.),” Ray wrote. With voice, as with other online experiences, we’re never talking about a single brand engaging customers, but many.

Which is why we’re still a ways off from figuring out how to make voice work from both a technical perspective and an experience perspective. As hard as artificial intelligence (AI) and machine learning are proving to be, they’re arguably simple compared to the larger question of how to engage consumers with voice without deafening them with spammy noise.