Bill Gates wasn’t wrong when he predicted that one day speech would popularize the Internet. When you consider that most people have at least one phone, the possibilities are certain to become realities.
In the 1980s, speech-recognition software was lauded as the hottest technology of the day. Imagine speaking to your computer and having it type out whatever you say in perfect grammar. Or, through a conversation with an “automated attendant” (computer) that sounds eerily human, retrieving and answering your e-mail and taking care of a host of office chores just short of fetching coffee.
Today, the venture-capital community is once again lauding the potential of speech-recognition software. Although it’s far from perfect, leading experts say it’s made great strides in the past few years. The dramatic improvement in PC processor speed, the growth of the Internet, and the explosive popularity of mobile phones are responsible for the current interest in voice technology.
Many telecom pundits insist the telephone will be the key to future Internet success. Leading telecoms and software companies are combining the telephone with voice-recognition technology to make the Web accessible worldwide.
Internet sites with voice-enabled content delivery or voice portals are capturing a lot of attention. Speech-technology advocates envision a day when millions of people will surf the Web by talking into their mobile phones rather than via a computer.
Research firm International Data Corp. (IDC) projects that the number of people using Web-enabled wireless devices will jump to 61.5 million in 2003, up from 7.4 million last year.
Companies such as BeVocal, Nuance, and SpeechWorks offer voice-enabled content services. Tellme Networks offers a service allowing users to search the Web using spoken commands. And Irvine, CA-based Sound Advantage offers what it calls a “voice user interface,” commonly referred to as CPE (customer premise equipment). Simply put, it’s a voice-activated call-processing system, or “electronic receptionist,” invented by the company’s founder and CEO, Michael Metcalf.
The centerpiece is “SANDi,” which Metcalf calls the “perfect receptionist,” and which connects to your existing phone system to automatically answer calls. It can handle multiple calls, screen incoming calls, forward urgent calls to designated numbers, retrieve and send e-mails, and provide a host of other services. If you didn’t know SANDi was a computer, you’d think it was human.
But that’s only a smattering of the products on the market.
Jay Wilpon, director of advanced speech research at AT&T Labs in Florham Park, NJ, has been researching speech-recognition technology for 20 years. And many major corporations such as AT&T, IBM, and ITT have been immersed in speech-recognition research since the late ’60s and early ’70s.
By the mid-1980s, NEC, Lucent Technologies, Nortel Networks, and Microsoft joined the fold. But the biggest advances have been made over the past decade, according to Wilpon. “Now, every couple of months another player hypes a new speech-recognition product.”
The biggest challenge is creating a flexible model that can, for example, recognize accents from different parts of the country, such as Chicago, New York, the South, or Boston. “Some of the milestones have nothing to do with the speech,” Wilpon said, “but in changes in microelectronics.”
According to Metcalf, “The idea is to make this technology so simple anyone can use it and not have to deal with features and functions.”
Fast-forward a half-century, and all communication equipment will be “device-oriented,” Wilpon predicted. Imagine having your television, refrigerator, stove, and vacuum cleaner all capable of responding to voice commands. Sound weird? It’s on the horizon.
Wilpon said there are openings in the voice technology field for engineers, mathematicians, computer scientists, and linguists, to name a few. “Opportunities exist on both the application and technology side,” he said. “The goal is to create better technology.”
Metcalf said he needs programmers who are experienced in object-oriented code, C++, and Java, as well as developers. “We hire senior programmers and teach them what they need to know, specifically ASR (automatic speech recognition), TTS (text-to-speech) technology, and the architecture of our platform.” But he also needs marketing and salespeople.
Put it all together and you’ll find opportunities at all ends of the business. This is the time to jump into this still-emerging industry.
Is your company using any products that feature speech-recognition software? Do they make work easier? Are they more trouble than they’re worth? Send us an e-mail or start a discussion.