Speech technology: Does it matter to SMBs?

Computer voice command and speech recognition are "sexy" technologies that everyone likes to talk about, but are they ready for prime time – especially in the SMB environment?

Last month, I attended the MVP Summit at the Seattle Convention Center and the Microsoft Campus in Redmond. Bill Gates gave the keynote speech for the first time in a couple of years, and one thing that was clear from his talk was that one of the technologies about which he's most excited is speech. Windows Vista has a much improved voice recognition engine built into the operating system, and I believe Microsoft will be focusing even more closely on speech in the future.

Voice command, speech-to-text, and text-to-speech are the esoteric types of features that make good copy; the whole idea conjures up images of futuristic sci-fi movies set on other planets in the twenty-third century where keyboards and mice are a thing of the past and we can interact with computers in the same conversational manner that we interact with fellow human beings. But what about here and now? What — if any — business value does speech technology offer SMBs today? Let's take a look at what's out there and what's coming soon and how it fits into your business strategy.

How speech technology works

There are three main parts to speech technology:

  • Voice command: a set of predefined spoken commands are recognized by the computer and used to perform tasks normally done by clicking menus and buttons or typing keyboard shortcuts (for example, opening a document, running a program, saving a file).
  • Speech-to-text: The computer "listens to" the words spoken and transcribes them to print on the screen, usually within a word processing program such as Microsoft Word.
  • Text-to-speech: The computer "reads" text documents, outputting to the computer's sound card.

Advantages of speech technology to SMBs

A generation ago, executives and professionals didn't need to know how to type. They routinely dictated their correspondence to secretaries who took shorthand or into tape recordings that were later transcribed by those in the "typing pool." Information was entered into computers (if the company had computers) by data entry clerks.

Today keyboarding is taught in elementary school and most office workers have some level of typing skill. However, many still aren't highly proficient at it, and we can all speak more quickly than we can type. In addition, carpel tunnel syndrome is a real problem for many people that makes it painful to type on a computer keyboard for long periods of time. And there are situations in which we can't (or shouldn't) be operating a keyboard, such as when we're driving a car. We can safely speak while driving, though, and could get work done if we could input information to our computers via voice.

Speech technology also makes it easier for disabled persons to use computers. Those with severe arthritis, hands that have been injured or amputated, and so forth can use speech recognition to input data into the computer, and the blind can use text-to-speech to read electronic documents to them.

Speech technology is also an important part of the unified communications model, by which workers can, for example, call in to their email servers and have their messages read to them over the phone, dictate replies, or use voice commands to delete, move, or forward messages.

Drawbacks of speech technology for SMBs

Although using your voice may seem like the most natural and easiest way to interface with your computer, it may not work as well in practice as it does in theory.

Pronunciation differences and homonyms

One problem is the huge range of differences in human voice tone, accent or dialect and individual pronunciation. People from one part of the country may pronounce "oil" as "awl," while those in another region say "Cuber" for "Cuba."

To complicate matters further, the English language has many homonyms, or words that sound the same but are spelled differently (e.g., there, they're and their or to, two, and too).

This is the reason speech to text is the most difficult of the three speech technologies to implement well. It's a tremendous challenge for the computer to be able to accurately recognize words that are spoken very differently by different people or to differentiate between words that sound exactly alike.

The result is that transcribed speech often includes many errors that have to be corrected, either by keyboard or by voice. Once you factor in the time spent on error correction, most touch typists find that they can create a finished document using the keyboard more quickly than they can do so using speech.

However, speech engines have been improving steadily and error rates are coming down. This is especially true if users take the time to "train" the speech engine extensively. This is done by reading many passages to the computer so it can become familiar with your voice tone and pronunciations. It takes longer to train the computer if you have a heavy Boston or Texas accent, for example, than if you have standard Midwest "TV anchor" speech patterns — but it can be done.

If you're considering implementing speech to text, you need to allocate sufficient time for voice training and assess the current typing speed of your users and whether speech will improve productivity or slow them down.

The business environment

Another consideration is the physical environment in which users work. If everyone has a private office, workers are separated by soundproof barriers or widely spaced, speech technologies will work better. If you have workers sitting close to one another in a crowded space, one user's microphone may pick up what's being said at the next desk and transcribe it. You can reduce this problem by having workers use headset microphones rather than desktop models (headsets will also usually greatly increase the general accuracy of speech recognition).

However, even if microphones are sufficiently directional, having all your workers talking to their computers constantly may create a noisy, chaotic atmosphere that's unpleasant to work in and leads to loss of concentration and lowered productivity.


Speech technology is, in a way, the "holy grail" for those who design user interfaces. The technology has come a long way, and software companies are dedicated to making it better. With a pretty good speech recognition engine now built into Vista, more applications will be written to take advantage of speech.

Speech is undoubtedly the wave of the future and the day will probably come when it's the standard way to interact with computers. But before you throw away all the keyboards at your small or midsize company, assess your employees' skills and how they work to determine whether using speech technology will benefit them.

You may find that speech is appropriate for some workers and not others, and that for many people, a combination of speech and keyboard input (for example, using voice commands but typing long documents in the traditional way) is the most efficient use of speech technology at this point in time.


Debra Littlejohn Shinder, MCSE, MVP is a technology consultant, trainer, and writer who has authored a number of books on computer operating systems, networking, and security. Deb is a tech editor, developmental editor, and contributor to over 20 add...

