You may be among the millions of IT workers who are eagerly awaiting the day when speech recognition technology will make your life easier.
That day is not far off, say vendors who sell the applications and devices that enable speech recognition on the PC, phone, and handheld.
What can you expect from speech recognition products currently on the market and those soon to be available? We looked at the offerings and talked to those in the know—here’s what we found.
Speech or voice recognition is the ability of a machine or program to recognize and carry out voice commands or take dictation. In general, speech recognition involves the ability to match a voice pattern against a provided or acquired vocabulary.Courtesy of WhatIs.com
Meeting diverse needs in the workplace
Already speech recognition is gaining popularity among workers with a variety of needs—a busy executive who dictates memos, individuals suffering from carpal tunnel syndrome, people with disabilities, lab workers who need to keep their hands free to use a microscope, and business travelers who tape-record information for later transcription.
The simplest desktop speech-enabling programs are most commonly used for dictation, screen navigation, and Web browsing, and retail for around $40. Philips Consumer Electronics’ FreeSpeech 2000, for example, is a speech-enabling package with the features for the PC.
“If you’re someone who creates a fair amount of documents and you can’t type, you’re going to get a lot of use from it,” said Rick Gallahan, director of marketing services for Philips Speech Processing, of FreeSpeech 2000. “Instead of having to struggle at the keyboard, you grab the microphone and verbally assemble the text. It’s a productivity tool. Beyond there, there may be applications that perhaps can be used for client billing or chart tracking. For an application that is keyboard intensive, you make it more user-friendly by speaking to the application.”
How does it work?
Also known as natural language programs, speech recognition allows the user to speak in normal conversation patterns as they control the computer through spoken commands. Users can also dictate text, and the more expensive programs allow users to speak at up to 140-160 words per minute.
Here’s how it works: the user “trains” the software by repeating keywords so it can recognize the user’s distinctive speech patterns. After the training, the software may be able to guess what was said, even if it didn’t understand every word. Words spoken out of context are harder to recognize.
Many vendors claim that their PC-based speech recognition systems have relatively high accuracy rates of 80 to 99 percent. One fact they don’t always mention in the sales pitch, however, is that many users give up when they learn that training the software can be a difficult task that takes practice and patience.
“The technology is still going through the maturation curve and the user adoption curve,” noted Gallahan. “It’s not Star Wars. You can’t log onto my Web site, order the product, and 10 minutes later be creating documents off the top of your head just by speaking freely. It’s kind of like getting a new puppy…a retriever. The capability of that puppy turning into a well-trained retriever is there. It doesn’t just organically happen. You have to work with it.
“Speech recognition is somewhat the same,” Gallahan continued. “It takes time to get it to pick up on the acoustics of your voice. If you’re serious about it, give it the full training period, which is about 45 minutes. You’re going to read text to it, and as a result, it learns the specific acoustic delivery style of your voice so it can understand those 30 or 40 phonemes in the English language. After it understands the acoustics of your voice, it wants to understand how you use words in combinations with one another. The better it can learn how you use words together and your word preference, the better it is able to make guesses and predictions.”
More vendors, more offerings
Growing numbers of vendors are selling in the speech recognition space and as a result, its uses are being taken further.
BBN Technologies , a unit of GTE Corp., is developing technology that can transcribe audio from any source, index it, identify speakers, detect numeric data such as dollar amounts, and categorize content by topic. Dragon Systems (recently purchased by Lernout & Hauspie ) and IBM have similar technology.
IBM will soon introduce technology for “conversational computing,” in which users command computers to fulfill orders, respond to account queries, create e-mail, or send faxes. Big Blue has also formed a strategic alliance with Nokia to increase open industry standards for speech technologies and to make speech a common interface for mobile devices.
One of the current and most common uses of speech recognition—and one that the IT manager may want to recommend to executives—is speech-enabling call center functions. Because the technology can now be integrated into larger complex solutions, companies are using it to deliver help desk technical support, HR benefits information, and telephone access to the company’s Web site. Giga Information Group predicts that this portion of the speech interaction market has begun to gain market attention and will become widely used in top enterprises by 2002.
“There’s going to be a high growth in applications based on the increased use of cell phones and Internet and the need to access Web content” in a simple, user-friendly way, said Elizabeth Herrell, a senior analyst with Giga. “The telephone will be one of the simplest ways to do that. The end user will be able to call into a Web site, ask for information, and have it sent back to them. I think for a very mobile society, that’s going to be important.”
Speech recognition technology has found a welcome dwelling in the workplace.
- Personal computer—users can navigate around applications, create documents, schedule appointments, send and receive e-mail, and browse the Web.
- Telephone—a caller can access numbers through automated directory assistance, listen to e-mail via text-to-speech software, and browse the Web.
- Handheld devices and PDAs—users can enter data, browse the Web, and download recordings via speech-to-text software on a PC.
How to implement speech recognition
With widespread usage of speech recognition technology expected in the workplace over the next several years, how should the IT manager direct usage throughout the organization?
Giga’s Herrell predicts that the large enterprise will be the first market to embrace speech recognition on the desktop, mainly because those organizations have the money and the manpower to implement the technology on a wide scale.
“On the high end, the larger speech engines have very good speech recognition. The consumer products that are out there today, the low-end things you buy for $150, those still need to be trained. They have less robust engines; they do an okay job. Nuance,SpeechWorks, they do a very good job of getting these engines up and running for large companies. Those technologies have come a long way."
Experts say the IT department would be wise to treat speech recognition like any new technology—be the point of introduction to users, train them on the programs, and choose the software that best matches the task at hand and is compatible with the network. Philips’ Gallahan recommends that IT managers make speech recognition functional for everyone in the organization, but suited to their individual needs.
“The CEO could use the technology to browse the Internet; the people in finance may be using an application for financial databasing, where they’re mining the database via the voice rather than having to work the keyboard,” he noted. “Your average worker bees may use a speech-enabling product to create documents and e-mails by speaking those documents rather than relying on the keyboard.”
Have you used speech recognition technology and found it helpful? Are you planning a company-wide implementation of speech recognition software? Tell us about your plans by posting a comment below. If you have a story idea you’d like to share, drop us a note.