Voice control: Speaking is better than swiping during the coronavirus

A survey shows that people are worried about privacy and security when using voice control but they like not having to touch anything.

How tech augments the human side of customer service
9:31

Using your voice instead of your hands to control the devices all around you sounds like the way to go during the coronavirus pandemic.

A small survey by Syntiant found that people are turning to this method of interacting with devices more frequently with Generation Z the most likely to increase the use of speech control  due to COVID-19. At least 50% of all other generations have tried out voice control with 81% of Gen Z and Millennials doing so compared with 68% of Generation X and 51% of Baby Boomers. Half of survey respondents list privacy and security as their top concerns for using voice recognition.

SEE: Coronavirus: Critical IT policies and tools every business needs (TechRepublic Premium)

Kurt Busch, CEO of Syntiant, predicts that adoption of voice interactions will continue to grow among various demographics as artificial intelligence (AI) technology becomes more pervasive at the local level.  

"It's clear that the current pandemic is driving demand for voice control, as people refrain from touching their devices in hopes of reducing health risks," said Kurt Busch, CEO of Syntiant. 

Syntiant builds voice applications for always-on applications in battery-powered devices, including smartphones, earbuds, wearables, remote controls, drones, security cameras, and sensors.

Survey respondents were most interested in using voice control for smartphones, smart TVs, and home appliances. The technology also has many relevant use cases outside the home, including at hospitals, factories, and call centers. Hospitals and companies are using chat bots to respond to high volumes of calls about the coronavirus.

Voice recognition technology still has some significant limitations. A recent study found that all five major automated speech recognition (ASR) systems had more trouble understanding black speakers than white ones. The average word error rate was 0.35 for black speakers compared with 0.19 for white speakers. The authors found that the race gap was equally large on a set of identical phrases spoken by black and white individuals in the data set.

The researchers from the University of Michigan tested how well ASR systems from Amazon, Apple, Google, IBM, and Microsoft transcribed structured interviews conducted with 42 white speakers and 73 black speakers. The research data included people from five US cities and almost 20 hours of audio.

The authors of the paper attribute this problem to the underlying acoustic models used to train the ASR systems. They also pointed out the need to audit emerging machine-learning systems to ensure they are trained on voices from a wide variety of people.

Syntiant's online survey was conducted April 22-24, 2020, by Engine Insights among a statistically viable population of adults 18 and older, weighted by age, gender, geographic region, race and education. The generations are defined as: Generation Z (ages 18-23), Millennials (ages 24-39), Generation X (ages 40-55) and Baby Boomers (ages 56-74).

Also see

syntiant-voice-survey.jpg

Image: Syntiant